Losing my latin on Ordering...

From: Dominique Devienne <ddevienne(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Losing my latin on Ordering...
Date: 2023-02-14 09:31:00
Message-ID: CAFCRh--xt-J8awOavhB216kom6TQnaP35TTVEQQS5bHH7gMemQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi. Porting a unit test to PostgreSQL, we got a failure related to ordering.

We've distilled it to the below. The DB is en_US.UTF-8, and the sorting we
get
does not make sense to me. The same prefix can be sorted differently based
on the suffix apprently, which doesn't make any sense to me.

Surely sorting should be "constant left-to-right", no? What are we missing?

I'm already surprised (star) comes before (space), when the latter "comes
before" the former in both ASCII and UTF-8, but that the two "Foo*" and
"Foo " prefixed pairs are not clustered after sorting is just mistifying to
me. So how come?

For now we can work-around this by explicitly adding the `collate "C"` on
the queries underlying that particular test, but that would be wrong in the
general case of international strings to sort, so I'd really like to
understand what's going on.

Thanks, --DD

PS: if I try "en_US.UTF-8" or "en_US"."UTF-8" for the collate, it fails.
How come what pg_database.datcollate displays is not a valid value for
collate?

PPS: We tried on v12 and v13 I believe. Somehow my v14.2 on Windows doesn't
have en_US as a collation...

ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'),
('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "en_US";
?column? | v
-------------+----------------------
<Foo*> | \x466f6f2a
<Foo All> | \x466f6f20416c6c
<Foo*All> | \x466f6f2a416c6c
<Foo Brief> | \x466f6f204272696566
(4 rows)

ddevienne=> with t(v) as (values ('Foo All'), ('Foo*'), ('Foo Brief'),
('Foo*All')) select '<'||v||'>', v::bytea from t order by v collate "C";
?column? | v
-------------+----------------------
<Foo All> | \x466f6f20416c6c
<Foo Brief> | \x466f6f204272696566
<Foo*> | \x466f6f2a
<Foo*All> | \x466f6f2a416c6c
(4 rows)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2023-02-14 10:23:19 Re: Losing my latin on Ordering...
Previous Message Alvaro Herrera 2023-02-14 09:27:26 Re: Support logical replication of DDLs