python - Help parsing between <pre> tags using BeautifulSoup -
i attempint parse out information website using beautifulsoup , python. html looks following. wanting parsed data like:
id definition
lysine.biosynthesis - burkholderia psuedomallei 17
... rest of data in similar place (within "pre" tags , outside "a" tags.
how can this?
<pre>id definition ---------------------------------------------------------------------------------------------------- <a href="/kegg-bin/show_pathway?bpm00300">bpm00300</a> lysine biosynthesis - burkholderia pseudomallei 17 <a href="/kegg-bin/show_pathway?bpm00330">bpm00330</a> arginine , proline metabolism - burkholderia pse <a href="/kegg-bin/show_pathway?bpm01100">bpm01100</a> metabolic pathways - burkholderia pseudomallei 171 <a href="/kegg-bin/show_pathway?bpm01110">bpm01110</a> biosynthesis of secondary metabolites - burkholder </pre>
i have tried by:
y=soup.find('pre') #returns data between <pre> tags. specific kegg in y: z =a.string
this gave me:
id definition ----------------------------------------------------------------------------------------------------
thanks help!
beautifulsoup() , search methods return hierarchical parse-tree object, not string. iterating through findchildren() on node found want (and skips header line):
for in soup.find('pre').findchildren(): z = a.string
Comments
Post a Comment