Re: Built-in CTYPE provider

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-01-22 18:49:56
Message-ID: 2bcd882a-cf20-40fc-84eb-5c5c6365ff56@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.01.24 23:03, Jeff Davis wrote:
> On Thu, 2024-01-18 at 13:53 +0100, Peter Eisentraut wrote:
>> I think that would be a terrible direction to take, because it would
>> regress the default sort order from "correct" to "useless".
>
> I don't agree that the current default is "correct". There are a lot of
> ways it can be wrong:
>
> * the environment variables at initdb time don't reflect what the
> users of the database actually want
> * there are so many different users using so many different
> applications connected to the database that no one "correct" sort order
> exists
> * libc has some implementation quirks
> * the version of Unicode that libc is based on is not what you expect
> * the version of libc is not what you expect

These are arguments why the current defaults are not universally
perfect, but I'd argue that they are still most often the right thing as
the default.

>>   Aside from
>> the overall message this sends about how PostgreSQL cares about
>> locales
>> and Unicode and such.
>
> Unicode is primarily about the semantics of characters and their
> relationships. The patches I propose here do a great job of that.
>
> Collation (relationships between *strings*) is a part of Unicode, but
> not the whole thing or even the main thing.

I don't get this argument. Of course, people care about sorting and
sort order. Whether you consider this part of Unicode or adjacent to
it, people still want it.

>> Maybe you don't intend for this to be the default provider?
>
> I am not proposing that this provider be the initdb-time default.

ok

>>   But then
>> who would really use it? I mean, sure, some people would, but how
>> would
>> you even explain, in practice, the particular niche of users or use
>> cases?
>
> It's for users who want to respect Unicode support text from
> international sources in their database; but are not experts on the
> subject and don't know precisely what they want or understand the
> consequences. If and when such users do notice a problem with the sort
> order, they'd handle it at that time (perhaps with a COLLATE clause, or
> sorting in the application).

> Vision:

> * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive
> matching, and customization with rules. It's the solution for
> everything from "slightly more advanced" to "very advanced".

I am astonished by this. In your world, do users not want their text
data sorted? Do they not care what the sort order is? You consider UCA
sort order an "advanced" feature?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-01-22 18:59:32 Re: partitioning and identity column
Previous Message Peter Eisentraut 2024-01-22 18:35:56 Re: make dist using git archive