From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Decision by Monday: PQescapeString() vs. encoding violation |
Date: | 2025-02-15 20:52:01 |
Message-ID: | 243481.1739652721@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2025-02-15 12:35:45 -0800, Jeff Davis wrote:
>> I am not suggesting a change, but there's a minor point about the
>> behavior of the replacement that I'd like to highlight:
>> Unicode discusses a choice[1]: "An ill-formed subsequence consisting of
>> more than one code unit could be treated as a single error or as
>> multiple errors."
> It seems completely infeasible to me to to implement the "single error"
> approach in a minor version. It'd afaict require non-trivial new
> infrastructure. We can't just consume up to the next byte without a high bit,
> because some encodings have subsequent bytes that are not guaranteed to have a
> high bit set.
Yeah. Also I think that probably depends on being able to tell the
difference between a first byte and a not-first byte of a multibyte
character, something that works in UTF-8 but not necessarily elsewhere.
As I commented in the security thread, Unicode's recommendations seem
pretty UTF-8-centric; I'm hesitant to adopt them wholesale in code
that has to deal with other encodings.
The v5 patch seems Good Enough(TM) to me. We can refine it later
perhaps; I don't think something like the above would affect
anything that external code should care about.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-02-15 20:52:09 | Re: Decision by Monday: PQescapeString() vs. encoding violation |
Previous Message | Andres Freund | 2025-02-15 20:43:22 | Re: Decision by Monday: PQescapeString() vs. encoding violation |