python - problem scraping with BeautifulSoup -


i trying scrape url http://www.kat.ph/search/beatles/?categories[]=music using beautifulsoup

torrents = bs.findall('tr',id = re.compile('torrent_*')) 

torrents gets torrents on page ,now every element of torrents contains tr element.

my problem len(torrents[0].td) 5 not able iterate on td's.i mean for x in torrents[o].td not working.

the data getting torrent[0] :

<tr class="odd" id="torrent_2962816"> <td class="fontsize12px torrentnamecell"> <div class="iaconbox floatedright"> <a title="torrent magnet link" href="magnet:?xt=urn:btih:0898a4b562c1098eb69b9b801c61a51d788df0f5&amp;dn=the+beatles+2009+greatest+hits+cdrip+ikmn+reupld&amp;tr=http%3a%2f%2ftracker.publicbt.com%2fannounce" onclick="_gaq.push(['_trackevent', 'download', 'magnet link', 'music']);" class="imagnet icon16"></a> <a title="download torrent file" href="http://torrage.com/torrent/0898a4b562c1098eb69b9b801c61a51d788df0f5.torrent?title=[kat.ph]the.beatles.2009.greatest.hits.cdrip.ikmn.reupld" onclick="_gaq.push(['_trackevent', 'download', 'download torrent file', 'music']);" class="idownload icon16"></a> <a class="ipartner2 icon16" href="http://www.downloadweb.org/checking.php?acode=b146a357c57fddd450f6b5c446108672&amp;r=d&amp;qb=vghliejlyxrszxmgwziwmdldiedyzwf0zxn0iehpdhmgq0rsaxatiglltu4gumvvugxk" onclick="_gaq.push(['_trackevent', 'download', 'download movie']);"></a> <a class="iverif icon16" href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html" title="verified torrent"></a> <a rel="2962816,0" class="icomment" href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html#comments_tab"> <span class="icommentdiv"></span>145     </a> </div> <div class="torrentname"> <a href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html" class="tortype musictype"></a> <a href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html">the <strong class="red">beatles</strong> [2009] greatest hits cdrip- ikmn reupld</a> <span>                 posted <a class="plain" href="/user/ikmn/">ikmn</a> <img src="http://static.kat.ph/images/verifup.png" alt="verified" /> in                      <span id="cat_2962816"> <a href="/music/">music</a> </span></span> </div> </td> <td class="nobr">168.26 <span>mb</span></td> <td>42</td> <td>1&nbsp;year</td> <td class="green">1368</td> <td class="red lasttd">94</td> </tr> 

i'd recommend using lxml or instead of beautifulsoup, among other great features can use xpath grab links:

import lxml.html doc = lxml.html.parse('http://www.kat.ph/search/beatles/?categories[]=music') links = doc.xpath('//a[contains(@class,"idownload")]/@href') 

Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -