Header unfolding in archived mail

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Header unfolding in archived mail
Date: 2013-09-07 22:07:45
Message-ID: 20130907220745.GA188338@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

The mailing list web archives display the subject of message
20130603190727(dot)GA360354(at)tornado(dot)leadboat(dot)com as follows:

Partitioning performance: cache stringToNode() ofpg_constraint.ccbin

Note the lack of whitespace after "of". The original message, which you can
see by downloading the mbox for June 2013, conveyed the subject this way:

Subject: Partitioning performance: cache stringToNode() of
pg_constraint.ccbin

Per RFC 5322, section 2.2.3:

The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation. An unfolded header field has no length restriction and
therefore may be indeterminately long.

So, the archives should present the subject like this:

Partitioning performance: cache stringToNode() of pg_constraint.ccbin

Gmane and osdir.com do so. MARC and Gmail show a space in place of the tab,
but Gmail converts every subject-line tab to a space. I have attached a
patch, against pgarchives.git, making its unfolding code conform to RFC 5322.
The change also affects headers folded before a space rather than before a
tab, such as 50E31370(dot)5030405(at)cybertec(dot)at(dot) Those have been displaying fine
despite the lack of unfolding because newline-space renders like a space in
HTML. I unit-tested the change, but I did not test the full archives load.

The "raw" message display feature seems to have its own set of rules, and I
failed to find their implementation. Here are the subject lines for the
aforementioned messages according to "raw" display:

Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
Subject: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket
communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes
long time to detect n/w breakdown

In one case, "\n\t" from the true raw original (in the mbox file) became " ".
In the other case, two instances of "\n " became "\n\t". Any ideas where that
transformation is coming from?

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
unfold-headers-v1.patch text/plain 1.8 KB

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Joe Conway 2013-09-08 15:24:37 SPAM from unsubscribed users on pgfoundry
Previous Message Bruce Momjian 2013-09-06 17:07:31 Re: "Mailing Lists" link