From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Oliver Ford <ojford(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix number skipping in to_number |
Date: | 2017-11-17 22:28:23 |
Message-ID: | 28186.1510957703@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> That leads me to the attached patch. There is more that could be done
> here --- in particular, I'd like to see the character-not-byte-count
> rule extended to literal text. But that seems like fit material for
> a different patch.
Attached is a patch that makes formatting.c more multibyte-aware;
it now handles multibyte characters as single NODE_TYPE_CHAR format
nodes, rather than one node per byte. This doesn't really have much
impact on the output (to_char) side, but on the input side, it
greatly simplifies treating such characters as single characters
rather than multiple ones. An example is that (in UTF8 encoding)
previously we got
u8=# select to_number('$12.34', '€99D99');
to_number
-----------
0.34
(1 row)
because the literal euro sign is 3 bytes long and was thought to be
3 literal characters. Now we get the expected result
u8=# select to_number('$12.34', '€99D99');
to_number
-----------
12.34
(1 row)
Aside from skipping 1 input character (not byte) per literal format
character, I fixed the SKIP_THth macro, allowing to_date/to_timestamp to
also follow the rule of skipping whole characters not bytes for non-data
format patterns. There might be some other places that need similar
adjustments, but I couldn't find any.
Not sure about whether/how to add regression tests for this; it's really
impossible to add specific tests in an ASCII-only test file. Maybe we
could put a test or two into collate.linux.utf8.sql, but it seems a bit
off topic for that, and I think that test file hardly gets run anyway.
Note this needs to be applied over the patch I posted at
https://postgr.es/m/3626.1510949486@sss.pgh.pa.us
I intend to commit that fairly soon, but it's not in right now.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-multibyte-literal-chars-in-formatting.c.patch | text/x-diff | 7.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Schneider | 2017-11-17 22:46:42 | Re: [HACKERS] pg_upgrade to clusters with a different WAL segment size |
Previous Message | Peter Geoghegan | 2017-11-17 22:22:23 | Re: [HACKERS] Parallel Hash take II |