From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(at)eisentraut(dot)org> |
Subject: | fixing bookindex.html bloat |
Date: | 2022-02-13 20:16:18 |
Message-ID: | 20220213201618.qz6p6noon3wagr3f@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Sometime last year I was surprised to see (not on a public list unfortunately)
that bookindex.html is 657kB, with > 200kB just being repetitions of
xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink"
Reminded of this, due to a proposal to automatically generate docs as part of
cfbot runs (which'd be fairly likely to update bookindex.html), I spent a few
painful hours last night trying to track this down.
The reason for the two xmlns= are different. The
xmlns="http://www.w3.org/1999/xhtml" is afaict caused by confusion on our
part.
Some of our stylesheets use
xmlns="http://www.w3.org/TR/xhtml1/transitional"
others use
xmlns="http://www.w3.org/1999/xhtml"
It's noteworthy that the docbook xsl stylesheets end up with
<html xmlns="http://www.w3.org/1999/xhtml">
so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional
afaict.
Adding xmlns="http://www.w3.org/1999/xhtml" to stylesheet-html-common.xsl gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in bookindex specific
content.
Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in navigation/footer.
Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional
references, rather than just the one necessary to get rid of the xmlns= spam.
So far, so easy. It took me way longer to understand what's causing the
all the xmlns:xlink= appearances.
For a long time I was misdirected because if I remove the <xsl:template
name="generate-basic-index"> in stylesheet-html-common.xsl, the number of
xmlns:xlink drastically reduces to a handful. Which made me think that their
existance is somehow our fault. And I tried and tried to find the cause.
But it turns out that this originally is caused by a still existing buglet in
the docbook xsl stylesheets, specifically autoidx.xsl. It doesn't omit xlink
in exclude-result-prefixes, but uses ids etc from xlink.
The reason that we end up with so many more xmlns:xlink is just that without
our customization there ends up being a single
<div xmlns:xlink="http://www.w3.org/1999/xlink" class="index">
and then everything below that doesn't need the xmlns:xlink anymore. But
because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted
for each element that autoidx.xsl has "control" over.
Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/
But we can easily reduce the "impact" of the issue, by just adding a single
xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc
to not repeat it.
Before:
-rw-r--r-- 1 andres andres 683139 Feb 13 04:31 html-broken/bookindex.html
After:
-rw-r--r-- 1 andres andres 442923 Feb 13 12:03 html/bookindex.html
While most of the savings are in bookindex, the rest of the files are reduced
by another ~100kB.
WIP patch attached. For now I just adjusted the minimal set of
xmlns="http://www.w3.org/TR/xhtml1/transitional", but I think we should update
all.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
pg-html-stylesheet.diff | text/x-diff | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-02-13 20:16:58 | Re: Mark all GUC variable as PGDLLIMPORT |
Previous Message | Tom Lane | 2022-02-13 20:09:20 | Re: Adding CI to our tree |