Re: Header unfolding in archived mail

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Re: Header unfolding in archived mail
Date: 2013-12-09 00:41:19
Message-ID: 20131209004119.GA1266851@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Sat, Sep 07, 2013 at 06:07:45PM -0400, Noah Misch wrote:
> The mailing list web archives display the subject of message
> 20130603190727(dot)GA360354(at)tornado(dot)leadboat(dot)com as follows:
>
> Partitioning performance: cache stringToNode() ofpg_constraint.ccbin
>
> Note the lack of whitespace after "of". The original message, which you can
> see by downloading the mbox for June 2013, conveyed the subject this way:
>
> Subject: Partitioning performance: cache stringToNode() of
> pg_constraint.ccbin
>
> Per RFC 5322, section 2.2.3:
>
> The process of moving from this folded multiple-line representation
> of a header field to its single line representation is called
> "unfolding". Unfolding is accomplished by simply removing any CRLF
> that is immediately followed by WSP. Each header field should be
> treated in its unfolded form for further syntactic and semantic
> evaluation. An unfolded header field has no length restriction and
> therefore may be indeterminately long.
>
> So, the archives should present the subject like this:
>
> Partitioning performance: cache stringToNode() of pg_constraint.ccbin
>
> Gmane and osdir.com do so. MARC and Gmail show a space in place of the tab,
> but Gmail converts every subject-line tab to a space. I have attached a
> patch, against pgarchives.git, making its unfolding code conform to RFC 5322.
> The change also affects headers folded before a space rather than before a
> tab, such as 50E31370(dot)5030405(at)cybertec(dot)at(dot) Those have been displaying fine
> despite the lack of unfolding because newline-space renders like a space in
> HTML. I unit-tested the change, but I did not test the full archives load.
>
>
> The "raw" message display feature seems to have its own set of rules, and I
> failed to find their implementation. Here are the subject lines for the
> aforementioned messages according to "raw" display:
>
> Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
> Subject: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket
> communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes
> long time to detect n/w breakdown
>
> In one case, "\n\t" from the true raw original (in the mbox file) became " ".
> In the other case, two instances of "\n " became "\n\t". Any ideas where that
> transformation is coming from?

Ping. Any advice on how to more-thoroughly test the pgarchives.git change, or
where I might find the corresponding code affecting "raw" message display?

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Fujii Masao 2013-12-09 07:23:24 pgfoundry.org down
Previous Message Gunnar "Nick" Bluth 2013-12-08 18:54:55 Re: [pgsql-www] Karlsruhe PUG