Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, Sandro Santilli <strk(at)kbt(dot)io>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-05-22 19:03:36
Message-ID: 4e3281d1ba5950dbe3b5d3c8d511da4c5751217a.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2023-05-22 at 14:27 +0200, Peter Eisentraut wrote:
> The rules are for setting whatever sort order you like.  Maybe you
> want
> to sort + before - or whatever.  It's like, if you don't like it,
> build
> your own.

A build-your-own feature is fine, but it's not completely zero cost.

There some risk that rules specified for ICU version X fail to load for
ICU version Y. If that happens to your database default collation, you
are in big trouble. The risk of failing to load a language tag in a
later version, especially one returned by uloc_toLanguageTag() in
strict mode, is much lower. We can reduce the risk by allowing rules
only for CREATE COLLATION (not CREATE DATABASE), and see what users do
with it first, and consider adding it to CREATE DATABASE later.

We can also try to explain in the docs that it's a build-it-yourself
kind of feature (use it if you see a purpose, otherwise ignore it),
though I'm not sure quite how we should word it.

And I'm skeptical that we don't have a single plausible end-to-end user
story. I just can't think of any reason someone would need something
like this, given how flexible the collation settings in the language
tags are. The best case I can think of is if someone is trying to make
an ICU collation that matches some non-ICU collation in another system,
which sounds hard; but perhaps it's reasonable to do in cases where it
just needs to work well-enough in some limited case.

Also, do we have an answer as to why specifying the rules as '' is not
the same as not specifying any rules[1]?

[1]
https://www.postgresql.org/message-id/36a6e89689716c2ca1fae8adc8e84601a041121c.camel@j-davis.com

> The co settings are just everything else.
> They are not parametric, they are just some other sort order that
> someone spelled out explicitly.

This sounds like another case where we can't really tell the user why
they would want to use a specific "co" setting; they should only use it
if they already know they want it. Is there some way we can word that
in the documentation so that people don't misuse them?

For instance, one of them is called "emoji". I'm sure a lot of
applications use emoji (or at least might encounter them), should they
always use co-emoji, or would some people who are using emoji not want
it? Can it be combined with "ks" or other "k*" settings?

What I'm trying to avoid is users seeing something in the documentation
and using it without it really being a good fit for their problem. Then
they see something unexpected, and need to rebuild all of their indexes
or something.

> > * I don't understand what "kc" means if "ks" is not set to
> > "level1".
>
> There is an example here:
> https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings#colcaselevel

Interesting, thank you.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2023-05-22 19:23:59 Re: PostgreSQL 16 Beta 1 release announcement draft
Previous Message Bruce Momjian 2023-05-22 18:07:01 Re: PG 16 draft release notes ready