[NTLUG:Discuss] Scriptable, javascript-aware web browser OR virtual operator

Stuart Johnston saj at thecommune.net
Thu Aug 30 09:15:31 CDT 2007


Leroy Tennison wrote:
> I need to "screen scrape" a generated web page which is generated by 
> filling in a form on a previous page. I haven't had any success finding 
> a solution yet so I'm hoping someone here can help.
> 
> I looked at Lynx and it is pretty capable as well as being easy to use 
> but unfortunately doesn't have javascript support.  The search button is 
> actually an 'input type=button ...' with onclick executing a javascript. 
>   Lynx displays the page but the button is not considered a link.
> 
> I found a web reference stating that Mechanize doesn't support 
> javascript either.

You might try Mozilla::Mechanize or Win32::IE::Mechanize.  There is also 
Selenium which is designed for web application testing but might work 
for scraping as well.

http://search.cpan.org/~slanning/Mozilla-Mechanize-0.05/
http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.009/
http://www.openqa.org/selenium/



More information about the Discuss mailing list