[NTLUG:Discuss] pulling tables out of web pages.

Burton M. Strauss III Burton_Strauss at comcast.net
Wed Sep 15 11:44:02 CDT 2004


For one-off projects I usually use Awk to do this - the gsub() command lets
me strip all of the HTML tags, etc. - but it's always a custom job because
it's so dependent upon the actual inputs.  You could also (maybe) use css w/
a custom style sheet plus Save-as.

-----Burton

> -----Original Message-----
> From: discuss-bounces at ntlug.org [mailto:discuss-bounces at ntlug.org]On
> Behalf Of Bobby Wrenn
> Sent: Thursday, April 08, 2004 2:34 PM
> To: NTLUG Discussion List
> Subject: [NTLUG:Discuss] pulling tables out of web pages.
>
>
> I have tried some html2txt tools and have had no success.
>
> I need to convert a web page into a tab delimited file (preferably
> keeping only the data table). My goal is to do several of these pages
> and cat them into a big table and delete duplicates.
>
> I think I can handle most of the problem if I can just convert the html
> to a tab delimited text file.
>
> Anyone know of a reliable tool?
>
> Here is a sample of the web pages I am working on:
> http://partsurfer.hp.com/cgi-bin/spi/main?sel_flg=partlist&model=K
AYAK+XU+6%2F266MT&HP_model=&modname=Kayak+XU+6%2F266MT&template=secondary&pl
ist_sval=ALL&plist_styp=flag&dealer_id=&callingsite=&keysel=X&catsel=X&ptyps
el=X&strsrch=&pictype=I&picture=X&uniqpic=

TIA
Bobby


_______________________________________________
https://ntlug.org/mailman/listinfo/discuss




More information about the Discuss mailing list