From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Larry Rosenman <ler(at)lerctr(dot)org> |
Cc: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Date: | 2016-02-10 23:00:39 |
Message-ID: | 17166.1455145239@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Larry Rosenman <ler(at)lerctr(dot)org> writes:
> On 2016-02-10 16:19, Tom Lane wrote:
>> I looked into the OS X sources, and found that indeed you are right:
>> *scanf processes the input a byte at a time, and applies isspace() to
>> each byte separately, even when the locale is such that that's a
>> clearly insane thing to do. Since this code was derived from FreeBSD,
>> FreeBSD has or once had the same issue. (A look at the freebsd project
>> on github says it still does, assuming that's the authoritative repo.)
>> Not sure about other BSDen.
> Definitive FreeBSD Sources:
> https://svnweb.freebsd.org/base/
Ah, thanks for the link. I'm not totally sure which branch is most
current, but at least on this one, it's still clearly wrong:
https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336&view=markup
convert_string(), which handles %s, applies isspace() to individual bytes
regardless of locale. convert_wstring(), which handles %ls, does it more
intelligently ... but as I said upthread, relying on %ls would just give
us a different set of portability problems.
It looks like Artur's patch is indeed what we need to do, along with
looking around for other *scanf() uses that are vulnerable.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Larry Rosenman | 2016-02-10 23:15:14 | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Previous Message | Larry Rosenman | 2016-02-10 22:39:11 | Re: Mac OS: invalid byte sequence for encoding "UTF8" |