Re: Decision by Monday: PQescapeString() vs. encoding violation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Cc: andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation
Date: 2025-02-15 20:35:45
Message-ID: 186954494c6b0bf643b1aa42fc67e9e25386ebe0.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2025-02-14 at 17:27 -0800, Noah Misch wrote:
> I'm attaching a WIP patch from Andres Freund.

I am not suggesting a change, but there's a minor point about the
behavior of the replacement that I'd like to highlight:

Unicode discusses a choice[1]: "An ill-formed subsequence consisting of
more than one code unit could be treated as a single error or as
multiple errors."

The patch implements the latter. Escaping:
<7A F0 80 80 41 7A>
results in:
<7A C0 20 C0 20 C0 20 41 7A>

The Unicode standard suggests[2] that the former approach may provide
more consistency in how it's done, but that doesn't seem important or
relevant for our purposes. I'd favor whichever approach results in
simpler code.

Regards,
Jeff Davis

[1]
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G48534

[2]
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G66453

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-02-15 20:43:22 Re: Decision by Monday: PQescapeString() vs. encoding violation
Previous Message Jeff Davis 2025-02-15 20:10:57 Re: Decision by Monday: PQescapeString() vs. encoding violation