pgsql: Improve to_date/to_number/to_timestamp behavior with multibyte c

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Improve to_date/to_number/to_timestamp behavior with multibyte c
Date: 2017-11-18 17:43:00
Message-ID: E1eG78u-0003f6-OA@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Improve to_date/to_number/to_timestamp behavior with multibyte characters.

The documentation says that these functions skip one input character
per literal (non-pattern) format character. Actually, though, they
skipped one input *byte* per literal *byte*, which could be hugely
confusing if either data or format contained multibyte characters.

To fix, adjust the FormatNode representation and parse_format() so
that multibyte format characters are stored as one FormatNode not
several, and adjust the data-skipping bits to advance by pg_mblen()
not necessarily one byte. There's no user-visible behavior change
on the to_char() side, although the internal representation changes.

Commit e87d4965b had already fixed most places where we skip characters
on the basis of non-literal format patterns to advance by characters
not bytes, but this gets one more place, the SKIP_THth macro. I think
everything in formatting.c gets that right now.

It'd be nice to have some regression test cases covering this behavior;
but of course there's no way to do so in an encoding-agnostic way, and
many of the interesting aspects would also require unportable locale
selections. So I've not bothered here.

Discussion: https://postgr.es/m/28186.1510957703@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/976a1a48fc35cde3c750982be64f872c4de4d343

Modified Files
--------------
src/backend/utils/adt/formatting.c | 68 +++++++++++++++++++++++---------------
1 file changed, 41 insertions(+), 27 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2017-11-18 18:40:02 pgsql: Consistently catch errors from Python _New() functions
Previous Message Tom Lane 2017-11-18 17:16:46 pgsql: Fix quoted-substring handling in format parsing for to_char/to_n