Re: Mac OS: invalid byte sequence for encoding "UTF8"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-02-10 23:00:39
Message-ID: 17166.1455145239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Larry Rosenman <ler(at)lerctr(dot)org> writes:
> On 2016-02-10 16:19, Tom Lane wrote:
>> I looked into the OS X sources, and found that indeed you are right:
>> *scanf processes the input a byte at a time, and applies isspace() to
>> each byte separately, even when the locale is such that that's a
>> clearly insane thing to do. Since this code was derived from FreeBSD,
>> FreeBSD has or once had the same issue. (A look at the freebsd project
>> on github says it still does, assuming that's the authoritative repo.)
>> Not sure about other BSDen.

> Definitive FreeBSD Sources:
> https://svnweb.freebsd.org/base/

Ah, thanks for the link. I'm not totally sure which branch is most
current, but at least on this one, it's still clearly wrong:
https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336&view=markup
convert_string(), which handles %s, applies isspace() to individual bytes
regardless of locale. convert_wstring(), which handles %ls, does it more
intelligently ... but as I said upthread, relying on %ls would just give
us a different set of portability problems.

It looks like Artur's patch is indeed what we need to do, along with
looking around for other *scanf() uses that are vulnerable.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2016-02-10 23:15:14 Re: Mac OS: invalid byte sequence for encoding "UTF8"
Previous Message Larry Rosenman 2016-02-10 22:39:11 Re: Mac OS: invalid byte sequence for encoding "UTF8"