[NTLUG:Discuss] Re: good "book" format for html? -- DocBook is more simple, more universal

Mon Nov 29 00:23:05 CST 2004

Bryan J. Smith wrote:

>On Sun, 2004-11-28 at 20:08, Brian wrote:
>  
>
> ...
>
>>Early attempts to get DocBook up and running were simply not worth
>>the effort required.
>>    
>>

Used to be that way; IIRC, you had to have at least 4 different 
packages, plus all dependencies by said packages, to run docbook.  I 
think it's down to 3 now.  Still too much effort for me. :-)

>...
>
>The original poster (and correct me if I am wrong) seem interested in
>building a parser around a simplistic and universal language for books. 
>DocBook is definitely far more appropriate markup than HTML for this. 
>At least it already has quite a bit of work already done on it, and you
>can convert between it and countless other editing/publication formats.
>  
>

Ah, I think we just found the source of the misunderstanding.  
[...re-reading orginal post...]  Nope, I never mentioned the word 
"parser" or anything close to it. (Please take that as a correction. :-)

I have multiple inputs:  ASCII text, HTML, and PDF; I'm the author of 
some, for most I'm not.  I store them all for future usage.  I transport 
them to various devices (different OS's) for reading.  The goal is to 
future-proof them, and make them convenient to transport.  To me, HTML 
is the most guaranteed format for that which still has markup/formatting 
capabilities.  Plus it's easy to grep for text in HTML if I'm searching 
for something. :-)  PLus my PDA program wants HTML as input.  (PDF comes 
close to fulfilling my needs for future-proofing, but I'm bothered by 
the "closed nature" or proprietariness of it.  Alas, PDF displays very 
poorly on my PDA too, while HTML comes out very nice.)

Now to be completely honest, when I'm transforming PDF into HTML, I do 
run the output of "pdftohtml" thru a perl script I wrote to "clean up" 
the output, but that's to simplify the HTML and to find where P tags 
should be inserted and to do so.  But that has nothing to do with my 
original question. :-)

To restate my original question a bit differently:  Are there any 
(preferablly open) *standard* formats and/or programs that can 
create/utilize HTML files with their supporting data files (e.g. images) 
embedded in them?  Then I can uphold my standard of 1 book 1 file, yet 
still have it all in a standard easy to create & use format that will be 
useful for a long time to come (and when stops being a standard, be easy 
enough to update to the "new long term thing").  [XHTML is probably 
going to be the next long term thing, but as it's mostly HTML4.0 
compatible, I'm not too worried right now.]

>...
>  
>

Kevin