python - How do define which spider the scrapy shell uses? -
i'm trying test out xpaths using scrapy shell, seems calling on incomplete spider module scraping, not want. there way define spider scrapy uses shell? more, why scrapy doing this; shouldn't know spider not ready use? that's why i'm using shell right? otherwise i'd using
scrapy crawl spider_name if wanted use specific spider.
edit: after looking @ spider docs forever, found following description spider instance used in shell.
spider - spider known handle url, or basespider object if there no spider found current url
this means, scrapy has correlated url spider, , using instead of basespider. unfortunately, spider not ready testing, there way force use basespider shell instead?
scrapy automatically selects spider based on allowed_domains attribute. if there more 1 spider given domain scrapy use basespider.
but, it's python shell, can instantiate spider want.
>>> myproject.spiders.myspider import myspider >>> spider = myspider() >>> spider.parse_item(response)
edit: workaround not use spider can set allowed_domains = []
Comments
Post a Comment