[NTLUG:Discuss] pulling tables out of web pages.
Victor Brilon
victor at victorland.com
Thu Apr 8 16:48:18 CDT 2004
If you're ok with Perl, check out the HTML::* package hierarchy on CPAN.
Lots of code already written to do this work.
Victor
Bobby Wrenn wrote:
> Greg Edwards wrote:
>
>> Bobby Wrenn wrote:
>>
>>> I have tried some html2txt tools and have had no success.
>>>
>>> I need to convert a web page into a tab delimited file (preferably
>>> keeping only the data table). My goal is to do several of these pages
>>> and cat them into a big table and delete duplicates.
>>>
>>> I think I can handle most of the problem if I can just convert the
>>> html to a tab delimited text file.
>>>
>>> Anyone know of a reliable tool?
>>>
>>> Here is a sample of the web pages I am working on:
>>> http://partsurfer.hp.com/cgi-bin/spi/main?sel_flg=partlist&model=KAYAK+XU+6%2F266MT&HP_model=&modname=Kayak+XU+6%2F266MT&template=secondary&plist_sval=ALL&plist_styp=flag&dealer_id=&callingsite=&keysel=X&catsel=X&ptypsel=X&strsrch=&pictype=I&picture=X&uniqpic=
>>>
>>>
>>> TIA
>>> Bobby
>>
>>
>>
>> If this is a one time deal? Read the file in with StarOffice Calc,
>> then save as a comma delimited file (text CVS). Some of the other
>> spreadsheet progs can do this as well.
>>
>> HTH
>
> I have been using OOo with good results for the one offs. But now i have
> 24 files to process and just wanted a way do them all at once.
>
> Ultimately, I would like to build a tool to go to the website pull down
> the page and convert it and save the result to a file. This is (as
> Usual) going to be a learning process for me.
>
> Thanks,
> Bobby
>
>
> _______________________________________________
> https://ntlug.org/mailman/listinfo/discuss
>
>
More information about the Discuss
mailing list