[NTLUG:Discuss] Scriptable, javascript-aware web browser OR virtual operator

Leroy Tennison leroy_tennison at prodigy.net
Tue Aug 28 23:37:41 CDT 2007


I need to "screen scrape" a generated web page which is generated by 
filling in a form on a previous page. I haven't had any success finding 
a solution yet so I'm hoping someone here can help.

I looked at Lynx and it is pretty capable as well as being easy to use 
but unfortunately doesn't have javascript support.  The search button is 
actually an 'input type=button ...' with onclick executing a javascript. 
  Lynx displays the page but the button is not considered a link.

I found a web reference stating that Mechanize doesn't support 
javascript either.

The method of submission on the starting page is a post so I can't use 
wget with a customized URL to go to the second page.

I'm aware that Greasemonkey allows using javascript but, if I was able 
to get the return page using this approach, how can I save the result as 
a file (in an automated way) so that it can be parsed?

I've seen references to the 'expect' program and know it is somehow 
related to TCL.  At this point I'd be willing to tolerate the learning 
curve for TCL if I knew I would reach my goal by doing so.  Anyone know 
if expect can do a post supplying form data and save the returned result 
to a file?

It seems that this goal is not trivial to achieve so I'm also looking at 
alternatives.  Anyone know of a program which will accept a script of 
keystrokes and can send those keystrokes to a javascript-aware browser 
(like Firefox or Opera, or whatever) as if it were coming from an 
interactive operator?

Anyone know of another alternative?  At this point I'm open to 
considering almost anything.

Thanks for any and all help.



More information about the Discuss mailing list