php - Web crawling and robots.txt - II -


similar scenario 1 of previous question:

  1. using wget, type following pull down images site (sub-folder):

     wget -r -a.jpg http://www.abc.com/images/ 
  2. i 2 images above command - img1, img2.

  3. the index.php file in http://www.abc.com/images/ refers img2.jpg (saw source).

  4. if key in http://www.abc.com/images/img4.jpg or http://www.abc.com/images/img5.jpg, 2 separate images.

  5. but these images not downloaded wget.

  6. how should go retrieving entire set of images under http://www.abc.com/images/?

not sure want try this:

wget --recursive --accept=gif,jpg,png http://www.abc.com 

this will:

  1. create directory called www.abc.com\
  2. crawl pages on www.abc.com
  3. save .gif, .jpg or .png files inside corresponding directories under www.abc.com\

you can delete directories except 1 you're interested in, namely, www.abc.com\images\

crawling pages time consuming operation way make sure you images referenced of pages on www.abc.com. there no other way detect images present inside http://abc.com/images/ unless server allows directory browsing.


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -