php - Web crawling and robots.txt

php - Web crawling and robots.txt - II -

similar scenario 1 of previous question:

using wget, type following pull down images site (sub-folder):
```
 wget -r -a.jpg http://www.abc.com/images/ 
```
i 2 images above command - img1, img2.
the index.php file in http://www.abc.com/images/ refers img2.jpg (saw source).
if key in http://www.abc.com/images/img4.jpg or http://www.abc.com/images/img5.jpg, 2 separate images.
but these images not downloaded wget.
how should go retrieving entire set of images under http://www.abc.com/images/?

not sure want try this:

wget --recursive --accept=gif,jpg,png http://www.abc.com

this will:

create directory called www.abc.com\
crawl pages on www.abc.com
save .gif, .jpg or .png files inside corresponding directories under www.abc.com\

you can delete directories except 1 you're interested in, namely, www.abc.com\images\

crawling pages time consuming operation way make sure you images referenced of pages on www.abc.com. there no other way detect images present inside http://abc.com/images/ unless server allows directory browsing.

Search This Blog

Barbera

php - Web crawling and robots.txt - II -

Comments

Post a Comment

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -