
# Migration of PG's documentation from DocBook 4.5 to DocBook 5.2

The migration from DocBook 4.x to 5.2 is a huge step that changes most
of PG's sgml and all xsl files. DocBook supports the migration with some scripts,
see: https://docbook.org/docs/howto/howto.html.  

Unfortunatelly, PG's documentation doesn't meet all prerequisites to utilize DocBook's
scripts directly. One of them, db4-upgrade.xsl, is slightly modified (see
comments starting with 'jup'). There are some bash, Perl, and sed commands
to solve generic and individual problems. This is a very specific work.
To be able to perform such changes at any point in time, all changes are
done within scripts.

The scripts are developed for PG version 13 up to 17.


## Major changes

- Discontinuation of a DOCTYPE declaration. Instead, there is an XML conforming
  **namespace** which uniquely identifies DocBook tags.
- Discontinuation of DTDs (and XSD schema) for XML-validation. Instead, the
  validation is done against a RELAX NG schema.
- Docbook namespace in all xsl scripts.
- Some tag names change (see: https://docbook.org/docs/howto/howto#changes-renamed)
  in order to adopt the XML conventions and standards, others are
  removed (see: https://docbook.org/docs/howto/howto#changes-removed).
  The content model of some tags is narrowed down and defined more precise.
- Some examples (see: https://docbook.org/docs/howto/howto#changes):
```
'id' is now: 'xml:id'

replace 'ulink' by 'link'
# DocBook 4
<ulink url="https://docbook.org">DocBook site</ulink>'
# DocBook 5 external URI (similar to HTML anchor 'href')
<link xlink:href="https://docbook.org">DocBook site</link>'
# DocBook 5 internal reference (with 'linkend' attribute)
<link linkend="pg_wal">Write-Ahead-Log</link>'   or the empty element <xref linkend="pg_wal"/>

# in DocBook 5 ALL elements can directly use 'linkend':
# DocBook 4
<link linkend='dir'><command>DIR</command></link>
# can be changed in DocBook 5 to:
<command linkend='dir'>DIR</command>
```

## Migration steps

The migration is steered by conv.sh. The script uses 3 directories: Scripts and
other necessary migration files are located in **$ToolDir**, the existing sgml files
are located in **$FromSgmlDir**, the migrated ones are in **$ToSgmlDir**. 
1. Preparation: The git tree of the complete PG source gets copied to a different
   place (**$FromSgmlDir** is one part of it). Hence, we can use 'diff'
   after any intermediate step to check the changes so far.
2. Migration:  
   2.1 All changes are done in **$ToSgmlDir**.
   2.2 Change Makefile to use Jing and introduce postgres-full.xml for old PG versions.
   2.3 Add namespace to all xsl scripts. Refer to version 1.79.2 of docbook xslt scripts.  
   2.4 Perform some general modifications on every sgml file to make all of them
       XML conform.  
   2.5 Perform individual changes on some sgml files (doRealChanges.sh).  
   2.6 Perform the standard DocBook migration 4.x -> 5.x for every sgml file.  
   2.7 Revert the standard modications done at 2.4.  
3. Validation: Perform validation against the RELAX NG schema. This is done with
   Jing because the error messages delivered by xmllint are not helpful.  
4. Check results by comparing old/new sgml, html, txt, and man files via diff.


## Introduction of a new tool

In the past, we used the tool **xmllint** to validate the sgml files against the DocBook
DTD. This worked well. Also, its validation against a RELAX NG schema works well as far
as no schema-violation occurs. But if the RELAX NG schema is violated by a sgml file,
the resulting error messages are more confusing than helpful.

Therefore, we should consider to introduce another validator. During the migration phase,
we have used **jing** (20181222+dfsg2-6). It's Java, it's fast, the error messages are
very precise. But there are many others: https://relaxng.org/#validators. Should we
switch completly to Jing for validation (Jing is not able to produce postgres-full.xml)?


### Installation of **jing** on Ubuntu:
```
sudo apt install jing
sudo apt install libavalon-framework-java  # (... possibly more)
# you need a Java runtime
```


## Status
* Pure SGML
  * Looks good (in general)
  * Some tags like <xref ... > or <replaceable ...> have many or long attributes that are often spread across multiple lines. During the migration such lines are joined together into one line. A following step trys to rebuild the original format, but without success in many cases.

* Postgres.txt
  * Identical with the exception of a few whitespaces.

* HTML single and multiple pages
  * Looks ok
  * Missing the text of tag <address> within <confgroup> (in Docbook 4.5 as well as in 5.2)

* Raw man pages
  * Missing content of 'Author:'
  * Additional instructions:  \fI  ...  \fR

* pdf
  * Slightly different page or font size

* epub: I'm not able to produce epub for DocBook 4.5 as well for
      DocBook 5.2 files.

* Makefile: PG 16+ uses the file postgres-full.xml. Older versions don't know this file and create output directly from postgres.sgml by including what the entities define. This entity resolution (in xsltproc) lose namespaces. The consequence is that nearly all xsl-templates do not hit. A 'quick-and-dirty' solution adds postgres-full.xml to the old Makefiles. Maybe it is worth to adopt the 16+ solution to older PG versions.


## ToDo

* 'Install' targets of Makefile as well as 'epub' target
* Adoption of doc/src/sgml/Makefile
* Adoption of Appendix J: Documentation
* Adoption of README.link


## Forecast

* Entities: We use **character entities** (e.g.: \&mdash;) as well as **parameter entities**
(e.g.: %filelist;). The use of character entities instead of hex-values or direct
Unicode-values is helpful because it improves the readability of the source for authors.
The use of parameter entities can - theoretically - be replaced by the more XML-conform
XInclude mechanism. But this isn't possible without major changes in most files:
  * Every xml/sgml-file must be XML conform, especially it must have a single root element. 
  * In every xml/sgml-file we must re-declare namespace(s). The reason is that parameter
    entities perform a plain text substitution whereas xi:include creates trees and combines
    them. During the combination of such subtrees namespaces get - intentionally -
    not inherited. In every file only its own namespaces are known.


