ruby on rails - How to parse nested ul/li tags using Hpricot -
i have following html structure
<div id='my_categories'> <ul> <li><a href="1">animals, birds, & pets</a></li> <li><a href="2">ask expert</a> <ul> <li><a href='21'>health care providers</a></li> <li><a href='22'>influnza</a> <ul> <li><a href='221'>flu viruses (2)</a></li> <li><a href='222'>test</a></li> </ul> </li> </ul> </li> </ul> </div>
this how web page looks
what need is, have categories table fields category_name, category_url , parent_id.
i need save each category , sub-category. parent_id denotes under category sub-category comes under.
how can parse through html structure using hpricot , save data database. please help
my table looks like
id category_name category_url parent_id 1 animals, birds, & pets null null 2 ask expert null null 3 health care providers null 2 4 influenza null 2 5 flu viruses null 4 6 test null 4
thanks in advance
below code worked me...
doc = hpricot(open(categories_page).read) doc.search("ul/li").each |li| category = li.search('a[@href]').first.inner_text.gsub(/ *\(.*?\)/, '') category_url = li.search('a').first[:href] category = category.find_or_create_by_name(category, :url => category_url) puts "---------- #{category.name} ------------" nodes = li.search("ul/li/a") unless nodes.empty? nodes.each |node| node_name = node.inner_text.gsub(/ *\(.*?\)/, '') node_url = node.attributes['href'] sub_category = category.find_by_name(node_name) if sub_category.blank? sub_category = category.create(:name => node_name, :url => node_url, :parent_category_id => category.id) puts " #{sub_category.name}" else sub_category.update_attribute('parent_category_id', category.id) puts " #{category.name} --> #{sub_category.name}" end end end end
Comments
Post a Comment