python - How do define which spider the scrapy shell uses? -

i'm trying test out xpaths using scrapy shell, seems calling on incomplete spider module scraping, not want. there way define spider scrapy uses shell? more, why scrapy doing this; shouldn't know spider not ready use? that's why i'm using shell right? otherwise i'd using

scrapy crawl spider_name

if wanted use specific spider.

edit: after looking @ spider docs forever, found following description spider instance used in shell.

spider - spider known handle url, or basespider object if there no spider found current url

this means, scrapy has correlated url spider, , using instead of basespider. unfortunately, spider not ready testing, there way force use basespider shell instead?

scrapy automatically selects spider based on allowed_domains attribute. if there more 1 spider given domain scrapy use basespider.

but, it's python shell, can instantiate spider want.

 >>> myproject.spiders.myspider import myspider >>> spider = myspider() >>> spider.parse_item(response)

edit: workaround not use spider can set allowed_domains = []

Search This Blog

Barbera

python - How do define which spider the scrapy shell uses? -

Comments

Post a Comment

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -