php - Web crawling and robots.txt - II -
similar scenario 1 of previous question:
using
wget
, type following pull down images site (sub-folder):wget -r -a.jpg http://www.abc.com/images/
i 2 images above command - img1, img2.
the index.php file in
http://www.abc.com/images/
refersimg2.jpg
(saw source).if key in
http://www.abc.com/images/img4.jpg
orhttp://www.abc.com/images/img5.jpg
, 2 separate images.but these images not downloaded wget.
how should go retrieving entire set of images under
http://www.abc.com/images/
?
not sure want try this:
wget --recursive --accept=gif,jpg,png http://www.abc.com
this will:
- create directory called
www.abc.com\
- crawl pages on
www.abc.com
- save .gif, .jpg or .png files inside corresponding directories under
www.abc.com\
you can delete directories except 1 you're interested in, namely, www.abc.com\images\
crawling pages time consuming operation way make sure you images referenced of pages on www.abc.com. there no other way detect images present inside http://abc.com/images/ unless server allows directory browsing.
Comments
Post a Comment