Quick Links

Re: Built-in CTYPE provider

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Built-in CTYPE provider
Date:	2024-01-22 23:33:54
Message-ID:	8f105bd641a2fcbddc7c5f0c2ce60731a70da0de.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2024-01-22 at 19:49 +0100, Peter Eisentraut wrote:

> >
> I don't get this argument. Of course, people care about sorting and
> sort order. Whether you consider this part of Unicode or adjacent to
> it, people still want it.

You said that my proposal sends a message that we somehow don't care
about Unicode, and I strongly disagree. The built-in provider I'm
proposing does implement Unicode semantics.

Surely a database that offers UCS_BASIC (a SQL spec feature) isn't
sending a message that it doesn't care about Unicode, and neither is my
proposal.

> >
> > * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive
> > matching, and customization with rules. It's the solution for
> > everything from "slightly more advanced" to "very advanced".
>
> I am astonished by this. In your world, do users not want their text
> data sorted? Do they not care what the sort order is?

I obviously care about Unicode and collation. I've put a lot of effort
recently into contributions in this area, and I wouldn't have done that
if I thought users didn't care. You've made much greater contributions
and I thank you for that.

The logical conclusion of your line of argument would be that libc's
"C.UTF-8" locale and UCS_BASIC simply should not exist. But they do
exist, and for good reason.

One of those good reasons is that only *human* users care about the
human-friendliness of sort order. If Postgres is just feeding the
results to another system -- or an application layer that re-sorts the
data anyway -- then stability, performance, and interoperability matter
more than human-friendliness. (Though Unicode character semantics are
still useful even when the data is not going directly to a human.)

> You consider UCA
> sort order an "advanced" feature?

I said "slightly more advanced" compared with "basic". "Advanced" can
be taken in either a positive way ("more useful") or a negative way
("complex"). I'm sorry for the misunderstanding, but my point was this:

* The builtin provider is for people who are fine with code point order
and no tailoring, but want Unicode character semantics, collation
stability, and performance.

* ICU is the right solution for anyone who wants human-friendly
collation or tailoring, and is willing to put up with some collation
stability risk and lower collation performance.

Both have their place and the user is free to mix and match as needed,
thanks to the COLLATE clause for columns and queries.

Regards,
Jeff Davis

In response to

Re: Built-in CTYPE provider at 2024-01-22 18:49:56 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	James Coleman	2024-01-23 00:29:30	Re: Add last_commit_lsn to pg_stat_database
Previous Message	Andrew Dunstan	2024-01-22 23:01:20	Re: WIP Incremental JSON Parser