Re: Does UCS_BASIC have the right CTYPE?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Peter Eisentraut" <peter(at)eisentraut(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Does UCS_BASIC have the right CTYPE?
Date: 2023-10-26 21:32:14
Message-ID: 1401159.1698355934@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> To me the question of what we should put in pg_collation.collctype
> for the "ucs_basic" collation leads to another question which is:
> why do we even consider collctype in the first place?

For starters, C locale should certainly act different from others.

I'm not sold that arguing from Unicode's behavior to other encodings
makes sense, either. Unicode can get away with defining that there's
only one case-folding rule because they have the luxury of inventing
new code points when the "same" glyph should act differently according
to different languages' rules. Encodings with a small number of code
points don't have that luxury. In particular see the mess around dotted
and dotless I/J in Turkish vs. everywhere else.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-10-26 21:53:29 Re: Making aggregate deserialization (and WAL receive) functions slightly faster
Previous Message Nathan Bossart 2023-10-26 21:28:32 Re: [17] Special search_path names "!pg_temp" and "!pg_catalog"