From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, hgonzalez(at)gmail(dot)com |
Subject: | Re: [GENERAL] psql weird behaviour with charset encodings |
Date: | 2015-05-23 17:43:06 |
Message-ID: | 20150523174306.GA3974893@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Sat, May 08, 2010 at 09:24:45PM -0400, Tom Lane wrote:
> hgonzalez(at)gmail(dot)com writes:
> > http://sources.redhat.com/bugzilla/show_bug.cgi?id=649
>
> > The last explains why they do not consider it a bug:
>
> > ISO C99 requires for %.*s to only write complete characters that fit below
> > the
> > precision number of bytes. If you are using say UTF-8 locale, but ISO-8859-1
> > characters as shown in the input file you provided, some of the strings are
> > not valid UTF-8 strings, therefore sprintf fails with -1 because of the
> > encoding error. That's not a bug in glibc.
>
> Yeah, that was about the position I thought they'd take.
GNU libc eventually revisited that conclusion and fixed the bug through commit
715a900c9085907fa749589bf738b192b1a2bda5. RHEL 7.1 is fixed, but RHEL 6.6 and
RHEL 5.11 are still affected; the bug will be relevant for another 8+ years.
> So the bottom line here is that we're best off to avoid %.*s because
> it may fail if the string contains data that isn't validly encoded
> according to libc's idea of the prevailing encoding.
Yep. Immediate precisions like %.10s trigger the bug as effectively as %.*s,
so tarCreateHeader() [_tarWriteHeader() in 9.2 and earlier] is also affected.
Switching to strlcpy(), as attached, fixes the bug while simplifying the code.
The bug symptom is error 'pg_basebackup: unrecognized link indicator "0"' when
the name of a file in the data directory is not a valid multibyte string.
Commit 6dd9584 introduced a new use of .*s, to pg_upgrade. It works reliably
for now, because it always runs in the C locale. pg_upgrade never calls
set_pglocale_pgservice() or otherwise sets its permanent locale. It would be
natural for us to fix that someday, at which point non-ASCII database names
would perturb this status output.
It would be good to purge the code of precisions on "s" conversion specifiers,
then Assert(!pointflag) in fmtstr() to catch new introductions. I won't plan
to do it myself, but it would be a nice little defensive change.
Attachment | Content-Type | Size |
---|---|---|
tar-namechars-v2.patch | text/plain | 1.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Begin | 2015-05-23 18:37:25 | Re: FW: Constraint exclusion in partitions |
Previous Message | Tom Lane | 2015-05-23 16:35:58 | Re: Enum in foreign table: error and correct way to handle. |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Harris | 2015-05-23 17:49:06 | Re: Asynchronous DRAM Self-Refresh |
Previous Message | Nils Goroll | 2015-05-23 17:26:48 | Re: xid wrap / optimize frozen tables? |