From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | postgresql(at)zr40(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17277: write past chunk when calling normalize() on an empty string |
Date: | 2021-11-10 06:33:29 |
Message-ID: | YYtnue3sc7EXIIwI@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Nov 09, 2021 at 09:55:08PM +0000, PG Bug reporting form wrote:
> When calling normalize(''), that is, on an empty string, a warning is
> raised: "problem in alloc set ExprContext: detected write past chunk end".
Well, direct callers of unicode_normalize_kc() in ~12 would have the
same problem because this code was not written with this case in mind
as far as I recall, after looking at the git history (60f11b8) as
pg_saslprep() does not allow the case of empty passwords.
> I believe this is due to an error in unicode_norm.c. In unicode_normalize(),
> when recompose is true (that is, when using NFC or NFKC normalization) the
> loop on line 498 will iterate once before checking count < decomp_size. When
> the input is an empty string, this would cause a write outside of the memory
> allocated for recomp_chars.
No, the code does not take the recomposition loop in this case, but
the initialization of target_pos to 1 would cause recomp_chars to be
written past its allocation position by one byte.
As there could be callers of unicode_normalize[_kc]() outside core,
I'd rather fix that at the source and patch unicode_norm.c. One way
to do that would be to leave once you know that there is nothing to
decompose after the loop over decompose_code() and return decomp_chars
that would be set with an empty set of points, as per the attached.
There may be a point in issuing an error if there is an empty string,
though. Another thing would be to consider if is_normalized() should
return false for an empty string, but we have considered empty strings
as normalized since this has been released:
=# SELECT '' IS NFD NORMALIZED;
is_normalized
---------------
t
(1 row)
That feels more natural this way. Still, I can see some perl modules
that would return false for such a case, by the way. The
normalization docs don't seem to mention that directly, except for the
stream-safe text format:
https://www.unicode.org/faq/normalization.html
https://unicode.org/reports/tr15/tr15-51.html
--
Michael
Attachment | Content-Type | Size |
---|---|---|
unicode-norm-fix.patch | text/x-diff | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2021-11-10 10:25:44 | BUG #17278: Silent install issue |
Previous Message | Andrey Borodin | 2021-11-10 05:49:33 | Re: conchuela timeouts since 2021-10-09 system upgrade |