Re: unexpected character used as group separator by to_char

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Gavan Schneider <list(dot)pg(dot)gavan(at)pendari(dot)org>
Cc: Vincent Veyron <vv(dot)lists(at)wanadoo(dot)fr>, pgsql-general <pgsql-general(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: unexpected character used as group separator by to_char
Date: 2021-03-10 12:58:20
Message-ID: 20210310125820.GA13231@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2021-Mar-10, Gavan Schneider wrote:

> On 10 Mar 2021, at 16:24, Alvaro Herrera wrote:
>
> > That space (0xe280af) is U+202F, which appears to be used for French and
> > Mongolian languages (exclusively?). It is quite possible that in the
> > future some other language will end up using some different whitespace
> > character, possibly breaking any code you write today -- the use of
> > U+202F appears to be quite recent.
> >
> Drifting off topic a little. That a proper code point for things that will
> benefit from the whitespace but should still stay together.
> Also it’s not that new, added in 1999 — https://codepoints.net/U+202F

I probably got misled on this whole thing by these change proposals.
https://www.unicode.org/L2/L2019/19116-clarify-nnbsp.pdf
https://www.unicode.org/L2/L2020/20008-core-text.pdf
Apparently prior to this, they (?) had been using/recommending
THIN SPACE U+2009 as separator, which is not non-breaking.

Anyway, it reinforces my point that it's not impossible that some other
locale definition could use U+2009 when printing numbers, or even some
other kind of spacing entity in non-Latin languages etc. So I think
that for truly robust handling you should separate the thing you use for
display from the thing you use to talk to the database.

> And the thin space is part of the international standard for breaking up
> large numbers (from 1948), specifically no dots or commas should be used in
> this role. The dot or comma is only to be used for the decimal point!

Interesting U+2014 EM DASH I didn't know this.

--
Álvaro Herrera Valdivia, Chile
"This is a foot just waiting to be shot" (Andrew Dunstan)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2021-03-10 13:20:47 Re: SV: Log files polluted with permission denied error messages after every 10 seconds
Previous Message Forum Writer 2021-03-10 12:34:27 WAL-files is not removing authomaticaly