Re: Mac OS: invalid byte sequence for encoding "UTF8"

From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-02-10 23:15:14
Message-ID: 2f6abbd8c0ce828200b7db0f8e9781b6@thebighonker.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-02-10 17:00, Tom Lane wrote:
> Larry Rosenman <ler(at)lerctr(dot)org> writes:
>> On 2016-02-10 16:19, Tom Lane wrote:
>>> I looked into the OS X sources, and found that indeed you are right:
>>> *scanf processes the input a byte at a time, and applies isspace() to
>>> each byte separately, even when the locale is such that that's a
>>> clearly insane thing to do. Since this code was derived from
>>> FreeBSD,
>>> FreeBSD has or once had the same issue. (A look at the freebsd
>>> project
>>> on github says it still does, assuming that's the authoritative
>>> repo.)
>>> Not sure about other BSDen.
>
>> Definitive FreeBSD Sources:
>> https://svnweb.freebsd.org/base/
>
> Ah, thanks for the link. I'm not totally sure which branch is most
> current, but at least on this one, it's still clearly wrong:
> https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336&view=markup
> convert_string(), which handles %s, applies isspace() to individual
> bytes
> regardless of locale. convert_wstring(), which handles %ls, does it
> more
> intelligently ... but as I said upthread, relying on %ls would just
> give
> us a different set of portability problems.
>
> It looks like Artur's patch is indeed what we need to do, along with
> looking around for other *scanf() uses that are vulnerable.
>
> regards, tom lane

that would be the current 10.x tree, production, and getting ready for
10.3 which is in code slush.

If you want, file a bug at https://bugs.freebsd.org/bugzilla

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler(at)lerctr(dot)org
US Mail: 7011 W Parmer Ln, Apt 1115, Austin, TX 78729-6961

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-02-10 23:25:18 Re: Moving responsibility for logging "database system is shut down"
Previous Message Tom Lane 2016-02-10 23:00:39 Re: Mac OS: invalid byte sequence for encoding "UTF8"