[NTLUG:Discuss] pulling tables out of web pages.

Victor Brilon victor at victorland.com
Thu Apr 8 16:48:18 CDT 2004


If you're ok with Perl, check out the HTML::* package hierarchy on CPAN. 
Lots of code already written to do this work.

Victor

Bobby Wrenn wrote:
> Greg Edwards wrote:
> 
>> Bobby Wrenn wrote:
>>
>>> I have tried some html2txt tools and have had no success.
>>>
>>> I need to convert a web page into a tab delimited file (preferably 
>>> keeping only the data table). My goal is to do several of these pages 
>>> and cat them into a big table and delete duplicates.
>>>
>>> I think I can handle most of the problem if I can just convert the 
>>> html to a tab delimited text file.
>>>
>>> Anyone know of a reliable tool?
>>>
>>> Here is a sample of the web pages I am working on:
>>> http://partsurfer.hp.com/cgi-bin/spi/main?sel_flg=partlist&model=KAYAK+XU+6%2F266MT&HP_model=&modname=Kayak+XU+6%2F266MT&template=secondary&plist_sval=ALL&plist_styp=flag&dealer_id=&callingsite=&keysel=X&catsel=X&ptypsel=X&strsrch=&pictype=I&picture=X&uniqpic= 
>>>
>>>
>>> TIA
>>> Bobby
>>
>>
>>
>> If this is a one time deal?  Read the file in with StarOffice Calc, 
>> then  save as a comma delimited file (text CVS).  Some of the other 
>> spreadsheet progs can do this as well.
>>
>> HTH
> 
> I have been using OOo with good results for the one offs. But now i have 
> 24 files to process and just wanted a way do them all at once.
> 
> Ultimately, I would like to build a tool to go to the website pull down 
> the page and convert it and save the result to a file. This is (as 
> Usual) going to be a learning process for me.
> 
> Thanks,
> Bobby
> 
> 
> _______________________________________________
> https://ntlug.org/mailman/listinfo/discuss
> 
> 



More information about the Discuss mailing list