From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Date: | 2016-02-10 16:58:00 |
Message-ID: | 28139.1455123480@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> writes:
> I agree that previous patch is wrong. Instead of using new
> parse_ooaffentry() function maybe better to use sscanf() with %ls
> format. The %ls format is used to read a wide character string.
No, that way is going to give you worse portability problems than what
we have now. Older implementations won't have %ls, and even if they
do, they might not have wcstombs() which is the only way you'd get from
libc's idea of wide characters to an encoding we recognize.
> I think this is not a bug. It is a normal behavior. In Mac OS sscanf()
> with the %s format reads the string one character at a time. The size of
> letter '' is 2. And sscanf() separate it into two wrong characters.
That argument might be convincing if OSX behaved that way for all
multibyte characters, but it doesn't seem to be doing that. Why is
only '' affected?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2016-02-10 17:03:38 | Re: Tracing down buildfarm "postmaster does not shut down" failures |
Previous Message | Teodor Sigaev | 2016-02-10 16:46:39 | Re: [PROPOSAL] Improvements of Hunspell dictionaries support |