pgsql: Avoid using %c printf format for potentially non-ASCII character

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Avoid using %c printf format for potentially non-ASCII character
Date: 2020-06-29 15:41:31
Message-ID: E1jpvuV-0000Pf-5d@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Avoid using %c printf format for potentially non-ASCII characters.

Since %c only passes a C "char" to printf, it's incapable of dealing
with multibyte characters. Passing just the first byte of such a
character leads to an output string that is visibly not correctly
encoded, resulting in undesirable behavior such as encoding conversion
failures while sending error messages to clients.

We've lived with this issue for a long time because it was inconvenient
to avoid in a portable fashion. However, now that we always use our own
snprintf code, it's reasonable to use the %.*s format to print just one
possibly-multibyte character in a string. (We previously avoided that
obvious-looking answer in order to work around glibc's bug #6530, cf
commits 54cd4f045 and ed437e2b2.)

Hence, run around and fix a bunch of places that used %c to report
a character found in a user-supplied string. For simplicity, I did
not touch places that were emitting non-user-facing debug messages,
or reporting catalog data that should always be ASCII. (It's also
unclear how useful this approach could be in frontend code, where
it's less certain that we know what encoding we're dealing with.)

In passing, improve a couple of poorly-written error messages in
pageinspect/heapfuncs.c.

This is a longstanding issue, but I'm hesitant to back-patch because
of the impact on translatable message strings. In any case this fix
would not work reliably before v12.

Tom Lane and Quan Zongliang

Discussion: https://postgr.es/m/a120087c-4c88-d9d4-1ec5-808d7a7f133d@gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/16e3ad5d143795b05a21dc887c2ab384cce4bcb8

Modified Files
--------------
contrib/hstore/hstore_io.c | 16 ++++++++++++----
contrib/pageinspect/heapfuncs.c | 11 ++++++-----
src/backend/utils/adt/encode.c | 20 +++++++++++++-------
src/backend/utils/adt/jsonpath_gram.y | 4 ++--
src/backend/utils/adt/regexp.c | 4 ++--
src/backend/utils/adt/varbit.c | 16 ++++++++--------
src/backend/utils/adt/varlena.c | 8 ++++----
7 files changed, 47 insertions(+), 32 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Geoghegan 2020-06-29 19:31:42 pgsql: nbtree: Correct inaccurate split location comment.
Previous Message Andrew Dunstan 2020-06-29 15:17:58 Re: pgsql: Enable Unix-domain sockets support on Windows