[NTLUG:Discuss] copying web documents

David Stanaway david at stanaway.net
Thu May 18 16:18:04 CDT 2006


David Stanaway wrote:
> Fred wrote:
>> I may be trying Jay's suggestion about a Windoze prog since wget has resisted
>> my puny efforts to make it work. Here's a thought: y'all try to get something
>> to copy the manual at the following URL and tell me how you did it. That way we
>> are on the (no pun intended) same page.
>>
>> http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/
>>
>> Thanks,
>> Fred
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around 
>> http://mail.yahoo.com 
>>
>> _______________________________________________
>> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>>
> 
> 
> 
> Okay, this is what I did.
> 
> $ cp `which wget` .
> $ sed -i 's/robots.txt/nobots.txt/g' wget
> $ ./wget -r -np -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12)
> Gecko/20060205 Epiphany/1.8.3 (Debian)'
> http://www.globalsecurity.org/military/
> 
> Alternately, you could get the source and take out the robots compliance.

By the way, I hope you realize that this does seem to be bypassing the
intent of the website authors. wget is explicitly singled out in the
robots file, so they do not intend people to be keeping local copies of
the site.




More information about the Discuss mailing list