XML: Single root element

From: Jürgen Purtz <juergen(at)purtz(dot)de>
To: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: XML: Single root element
Date: 2019-01-30 10:34:12
Message-ID: f8b59177-1251-8813-541f-73383aa744f5@purtz.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Some time ago we upgraded our documentation from SGML to XML in a huge
step. Most of the resulting files are well-formed - but not all. The
well-formed criteria is violated by such files which contains more than
one root element. You can locate such files with the command:

xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra)

Actually this is not a serious problem. But for further XML processing
(parsing, Docbook upgrade to version 5.x, use of an XML-editor,
xinclude, xpath, namespaces, ... ) it is necessary - or at least very
helpful - to change the content of every single file in a manual step to
a *well-formed* XML file, especially with one single root element. The
attached patch results from applying different strategies to achieve
this aim.

Strategy 1: Move the element of the outer file where the 'calling'
entity resides to the included file as an additional top-level element.
Example 'legal.sgml':

Actual situation
================
postgres.sgml:
<book id="postgres">
 <title>PostgreSQL &version; Documentation</title>

 <bookinfo>  <corpauthor>The PostgreSQL Global Development
Group</corpauthor>
  <productname>PostgreSQL</productname>
  <productnumber>&version;</productnumber>
  &legal;
 </bookinfo>
 ...

legal.sgml:
<date>2019</date>

<copyright>
 <year>1996-2019</year>
 <holder>The PostgreSQL Global Development Group</holder>
</copyright>

<legalnotice id="legalnotice">
...
</legalnotice>
-- End of File --

New situation
=============
postgres.sgml:
<book id="postgres">
 <title>PostgreSQL &version; Documentation</title>

 &legal;
 ...

legal.sgml:
<bookinfo>
 <corpauthor>The PostgreSQL Global Development Group</corpauthor>
 <productname>PostgreSQL</productname>
 <productnumber>&version;</productnumber>
 <date>2019</date>

 <copyright>
  <year>1996-2019</year>
  <holder>The PostgreSQL Global Development Group</holder>
 </copyright>

 <legalnotice id="legalnotice">
 ...
 </legalnotice>
</bookinfo>
-- End of File --

Some single files are changed but the intermediate file (respectively
the main memory) after resolving all entities keeps unchanged. This file
resp. main memory is the basis for all further steps like validation or
output generation.

Strategy 2: The files of the release notes consists of many
sect1-elements at the top level. To overcome this situation one can try
to change sect1 to sect2, sect2 to sect3, ... and use a new sect1
element as a cramp over the complete file. The chain of sect<n> sections
is limited to 5 levels - and in some cases we use all of them. Therefore
it's necessary to change the mark-up from sect<n>-elements to
section-elements, which can be used recursively without limits.
This strategy leads to changes in the visual representation of the TOC,
because every title-element shifts one level down. (In my opinion this
is an improvement because a: after clicking to 'Release Notes' we
actually have 372 items plus their sub-items. This will be reduced to
one item per major release: 11, 10, 9.6, 9.5, ... and b: the
acknowledgement-element is shown - as intended - per complete major
release, not only with the very first version of a release.) Furthermore
we have exactly one HTML file per major release for the standard HTML
output.

Strategy 3: Split huge files into smaller files (contrib, xfunc) and/or
shift some sections to the calling file. From the perspective of a git
user or someone, who translates the documentation to a different
language, this is not funny but I hope that it will be accepted.

PS_1: For tests don't forget the Make-target 'errcodes-table.sgml'
PS_2: The remaining files version.sgml, filelist.sgml and
ref/allfiles.sgml, which contains nothing but entity definitions, will
possibly change or get superfluous with the migration to Docbook 5.x.

Kind regards
Jürgen Purtz

Attachment Content-Type Size
XmlWellFormed.patch text/x-patch 498.6 KB

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2019-01-31 04:55:20 Not working
Previous Message Ioseph Kim 2019-01-28 16:54:24 Re: patch earthdistance.sgml (add geo_distance function description)