[NTLUG:Discuss] Re: good "book" format for html? -- DocBook is more simple, more universal

Mon Nov 29 00:03:27 CST 2004

On Mon, 2004-11-29 at 00:44, Kevin Brannen wrote:
> I'm not sure what "structure and capability" you mean that HTML doesn't 
> have.  You'll need to define your terms.

It's far too free-form IMHO.

> HTML has structure to me.  It can markup (& do the right thing for) 
> paragraphs, show headers for chapters, and can create TOC and an index, 
> not to mention HR for section separators if you need them.  It has all 
> the basic capabilities (text markup features) necessary for the basic 
> book/article/HowTo.  It can change fonts (size, color, style), justify, 
> and you can insert images (more on this in a second).  With the 
> exception of some niche needs (e.g. math & music books, plus maybe a few 
> other specialties which I'm not worrying about), I have yet to run into 
> anything not in HTML that I need.

Yes, you can do it all in HTML.  But how do you parse it?  How to you
know something is the Book Title, the TOC, a Chapter Title, an Index
reference?  How are those things built automatically?

If you're desire is to do it all in HTML, you surely can.  And you can
parse the HTML.  But then that requires assumptions on what each thing
is.  Then you're into defining those "meta-tags."

> Of course, MathML may come to the rescue for the math niche.

And MathML can be converted to HTML.  But backwards, eh, a little
rougher.  So if you write your equations in HTML now, how do you get to
something like MathML?

> Imagine you're a Robert Heinlein and you have a new novel (e.g. "Space 
> Cadet" picking a book at random off my shelf).  For something like that, 
> what do you need that is not in HTML?

There's nothing stopping you from writing in Postscript.  There's also
nothing stopping you from writing anything that does HTML or Postscript
for that matter.  But at what point do you lose control over editing
it?  Parsing it?

Yes, there are the different _style_ tags.  But are they structure? 
There is really no difference between the _structure_ of the title page,
TOC, chapters, sections, etc... in HTML.  How will your parser handle
this?

Again, I can only see you going back to defining some meta-tags.

> except maybe the Appendies but images would probably come to rescue there
> for the genology trees.

But how do you differentiate between the portions of the book:  title,
copyright page, TOC, chapters, appendicies, etc...?

> Why do I care?  Because my HowTo's turn into novellas. :-)  Not to 
> mention I write my own short stories in my free time, as well as collect 
> ebooks (preferablly in open formats which limits what I can buy but 
> that's life).

And that's cool.  But how about considering a language that doesn't
limit how you can publish?

Let's say a major publisher takes interest.  If you could give them a
PDF version with 1 command that is easily reproduced by a "Print
Quantity Needed" machine, would that not save them time?

What about playing with the layout?  Maybe some users like a single
HTML, some like HTML pages by chapter, some by section, some like a
framed HTML layout?

What about inter-references in the book?  What about inter-references
between books in a series?  There are a lot of simple things to learn
with many typesets, things that already handle automagically creating
these for you.

> I don't need an 18 blade Swiss army knife, my little 4 bladed one (HTML 
> :-) works just fine.

But DocBook is even easier than HTML.  It's pure content, no style
crap.  Consistency and simplicity, not free-form and unmaintainable.

> Don't care what it was designed for.  It has a standard (4.0), the 
> standard meets my needs (with 1 exception).  I don't even need CSS, 
> though I recognize that's there if I need it.

To each his own.

I just saw "parsers" and realized you're either going to have one big
frustration, or you're going to have to implement your own "meta-tags"
to manage the flow of your document.

If you're going to do that, you might as well consider creating your own
XML instance of HTML.  So it makes life easy for parsers.  Otherwise
there is no structure to the document, other than assumptions in how the
HTML should be created.

Which is, again, very difficult for parsers.

-- 
Bryan J. Smith                                    b.j.smith at ieee.org 
-------------------------------------------------------------------- 
Subtotal Cost of Ownership (SCO) for Windows being less than Linux
Total Cost of Ownership (TCO) assumes experts for the former, costly
retraining for the latter, omitted "software assurance" costs in 
compatible desktop OS/apps for the former, no free/legacy reuse for
latter, and no basic security, patch or downtime comparison at all.