[NTLUG:Discuss] Scriptable, javascript-aware web browser OR virtual operator
Leroy Tennison
leroy_tennison at prodigy.net
Tue Aug 28 23:37:41 CDT 2007
I need to "screen scrape" a generated web page which is generated by
filling in a form on a previous page. I haven't had any success finding
a solution yet so I'm hoping someone here can help.
I looked at Lynx and it is pretty capable as well as being easy to use
but unfortunately doesn't have javascript support. The search button is
actually an 'input type=button ...' with onclick executing a javascript.
Lynx displays the page but the button is not considered a link.
I found a web reference stating that Mechanize doesn't support
javascript either.
The method of submission on the starting page is a post so I can't use
wget with a customized URL to go to the second page.
I'm aware that Greasemonkey allows using javascript but, if I was able
to get the return page using this approach, how can I save the result as
a file (in an automated way) so that it can be parsed?
I've seen references to the 'expect' program and know it is somehow
related to TCL. At this point I'd be willing to tolerate the learning
curve for TCL if I knew I would reach my goal by doing so. Anyone know
if expect can do a post supplying form data and save the returned result
to a file?
It seems that this goal is not trivial to achieve so I'm also looking at
alternatives. Anyone know of a program which will accept a script of
keystrokes and can send those keystrokes to a javascript-aware browser
(like Firefox or Opera, or whatever) as if it were coming from an
interactive operator?
Anyone know of another alternative? At this point I'm open to
considering almost anything.
Thanks for any and all help.
More information about the Discuss
mailing list