Re: Built-in CTYPE provider

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-01-18 12:53:36
Message-ID: 67df0672-5bc0-4b2b-b9e0-00e12bdca601@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.01.24 03:02, Jeff Davis wrote:
> New version attached. Changes:
>
> * Named collation object PG_C_UTF8, which seems like a good idea to
> prevent name conflicts with existing collations. The locale name is
> still C.UTF-8, which still makes sense to me because it matches the
> behavior of the libc locale of the same name so closely.

I am catching up on this thread. The discussions have been very
complicated, so maybe I didn't get it all.

The patches look pretty sound, but I'm questioning how useful this
feature is and where you plan to take it.

Earlier in the thread, the aim was summarized as

> If the Postgres default was bytewise sorting+locale-agnostic
> ctype functions directly derived from Unicode data files,
> as opposed to libc/$LANG at initdb time, the main
> annoyance would be that "ORDER BY textcol" would no
> longer be the human-favored sort.

I think that would be a terrible direction to take, because it would
regress the default sort order from "correct" to "useless". Aside from
the overall message this sends about how PostgreSQL cares about locales
and Unicode and such.

Maybe you don't intend for this to be the default provider? But then
who would really use it? I mean, sure, some people would, but how would
you even explain, in practice, the particular niche of users or use cases?

Maybe if this new provider would be called "minimal", it might describe
the purpose better.

I could see a use for this builtin provider if it also included the
default UCA collation (what COLLATE UNICODE does now). Then it would
provide a "common" default behavior out of the box, and if you want more
fine-tuning, you can go to ICU. There would still be some questions
about making sure the builtin behavior and the ICU behavior are
consistent (different Unicode versions, stock UCA vs CLDR, etc.). But
for practical purposes, it might work.

There would still be a risk with that approach, since it would
permanently marginalize ICU functionality, in the sense that only some
locales would need ICU, and so we might not pay the same amount of
attention to the ICU functionality.

I would be curious what your overall vision is here? Is switching the
default to ICU still your goal? Or do you want the builtin provider to
be the default? Or something else?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-01-18 12:54:30 Re: the s_lock_stuck on perform_spin_delay
Previous Message Anton Voloshin 2024-01-18 12:47:22 039_end_of_wal: error in "xl_tot_len zero" test