Parse HTML using Python and Beautiful Soup -
<div class="profile-row clearfix"><div class="profile-row-header">member since</div><div class="profile-information">january 2010</div></div> <div class="profile-row clearfix"><div class="profile-row-header">aiga chapter</div><div class="profile-information">alaska</div></div> <div class="profile-row clearfix"><div class="profile-row-header">title</div><div class="profile-information">owner</div></div> <div class="profile-row clearfix"><div class="profile-row-header">company</div><div class="profile-information">mad dog graphx</div></div> i'm using beautiful soup point in html code. want search through code, , pull data january 2010, alaska, owner, , mad dog graph. data has same class have different variables "member since", "aiga chapter," etc. before hand. how can search member since , january 2010. , same other 3 fields?
>>> beautifulsoup import beautifulsoup >>> soup = beautifulsoup('''<div class="profile-row clearfix"><div class="profile-row-header">member since</div><div class="profile-information">january 2010</div></div> ... <div class="profile-row clearfix"><div class="profile-row-header">aiga chapter</div><div class="profile-information">alaska</div></div> ... <div class="profile-row clearfix"><div class="profile-row-header">title</div><div class="profile-information">owner</div></div> ... <div class="profile-row clearfix"><div class="profile-row-header">company</div><div class="profile-information">mad dog graphx</div></div> ... ''') >>> row in soup.findall('div', {'class':'profile-row clearfix'}): ... field, value = row.findall(text = true) ... print field, value ... member since january 2010 aiga chapter alaska title owner company mad dog graphx you can of course want field , value, create dict them or store them in database.
if there other divs or other text nodes within "profile-row clearfix" div, you'll need field = row.find('div', {'class':'profile-row-header'}).findall(text=true), etc.
Comments
Post a Comment