Regex between > and < in R with stringr -


how can capture string between > , < in r.

 d<-"\"id/56771\" target=\"_self\">children- , adolescents</a></li>\n\t\t\t<li><" 

//m

str_extract(d,">+(.*?)+<") gives me

>children- , adolescents</a></li>\n\t\t\t<li>< 

i guess new string command trick, thought there more direct...

you can use str_extract, str_match may better suited:

str_extract(d, ">.*?<") [1] ">children- , adolescents<" 

the trick here ? modifier tells regex not greedy. regex matching greedy default, means match longest string pattern.

this still leaves bit of work do, i.e. remove first , last character. 1 can vector subsetting, or might easier use str_match instead. returns of pattern matches array:

str_match(d, ">(.*?)<")      [,1]                          [,2]                        [1,] ">children- , adolescents<" "children- , adolescents" 

(the 2 matches 1. entire string, , 2. pattern inside brackets.)

this means it's simple matter of returning second element:

str_match(d, ">(.*?)<")[2] [1] "children- , adolescents" 

Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -