.net - Is there a way to get files from a webserver when directory listing is deactivated? -
i try build "crawler" or "atuomatic downloader" each file based on webserver / webpage.
so in oppinion there 2 ways:
1) directory listing enabled. easy, read out data in listing , download every file see.
2) directory listing disabled. then? idea have brute force filenames , see reaction of server (e.g.: 404 no file, 403 found directory, , data correct found data).
is idea right? there better way?
you can parse html , , follow ('crawl') links get. way crawlers implemented.
check these libraries out it:
.net: html agility pack
python: beautiful soup
php: htmlsimpledom
always robots.txt in site's root , make sure respect site's rules on pages allowed be crawled.
Comments
Post a Comment