ruby on rails - How to parse nested ul/li tags using Hpricot -


i have following html structure

 <div id='my_categories'>    <ul>      <li><a href="1">animals, birds, & pets</a></li>      <li><a href="2">ask expert</a>        <ul>          <li><a href='21'>health care providers</a></li>          <li><a href='22'>influnza</a>            <ul>              <li><a href='221'>flu viruses (2)</a></li>             <li><a href='222'>test</a></li>            </ul>          </li>        </ul>      </li>     </ul>   </div> 

this how web page looks

enter image description here

what need is, have categories table fields category_name, category_url , parent_id.

i need save each category , sub-category. parent_id denotes under category sub-category comes under.

how can parse through html structure using hpricot , save data database. please help

my table looks like

   id   category_name              category_url  parent_id     1    animals, birds, & pets     null          null    2    ask expert             null          null    3    health care providers      null          2    4    influenza                  null          2    5    flu viruses                null          4    6    test                       null          4 

thanks in advance

below code worked me...

   doc = hpricot(open(categories_page).read)    doc.search("ul/li").each |li|       category = li.search('a[@href]').first.inner_text.gsub(/ *\(.*?\)/, '')      category_url = li.search('a').first[:href]      category = category.find_or_create_by_name(category, :url => category_url)       puts "---------- #{category.name} ------------"      nodes = li.search("ul/li/a")      unless nodes.empty?        nodes.each |node|          node_name = node.inner_text.gsub(/ *\(.*?\)/, '')          node_url = node.attributes['href']          sub_category = category.find_by_name(node_name)          if sub_category.blank?            sub_category = category.create(:name => node_name, :url => node_url, :parent_category_id => category.id)            puts " #{sub_category.name}"          else            sub_category.update_attribute('parent_category_id', category.id)            puts "  #{category.name} --> #{sub_category.name}"          end        end      end        end 

Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -