pg_upgrade fails to preserve old versions of the predefined collations

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_upgrade fails to preserve old versions of the predefined collations
Date: 2019-11-28 20:08:08
Message-ID: 01adb62f-b114-1237-63ae-b7f030a18389@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

When dealing with an OS upgrade, a some kind of anomaly related to
collations was found.
Suppose, we have Debian 8 with postgresql 12 installed.
Then we create a custom collation:
CREATE COLLATION russian (provider=icu, locale='ru_RU');
and
SELECT oid, collname, collnamespace, collprovider, collversion FROM
pg_collation WHERE collname like 'ru%';
returns
 12847 | ru-RU-x-icu |            11 | i            | 58.0.0.50
...
 16384 | russian     |          2200 | i            | 58.0.0.50
Then let's create two tables with text columns and indexes and fill them
with some data:
CREATE TABLE test_icu_ru (f1 varchar COLLATE "ru-RU-x-icu", i int);
INSERT INTO test_icu_ru SELECT chr(x), x FROM generate_series(1, 2000)
as y(x); CREATE INDEX ON test_icu_ru (f1);

CREATE TABLE test_icu_russian (f1 varchar COLLATE "russian", i int);
INSERT INTO test_icu_russian SELECT chr(x), x FROM generate_series(1,
2000) as y(x); CREATE INDEX ON test_icu_russian (f1);

Perform two test queries:
postgres=# select * from test_icu_ru where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# select * from test_icu_russian where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# EXPLAIN select * from test_icu_ru where f1=chr(821);
                                      QUERY
PLAN                                     
--------------------------------------------------------------------------------------
 Index Scan using test_icu_ru_f1_idx on test_icu_ru  (cost=0.28..8.29
rows=1 width=6)
   Index Cond: ((f1)::text = '̵'::text)
(2 rows)

postgres=# EXPLAIN select * from test_icu_russian where f1=chr(821);
                                           QUERY
PLAN                                          
------------------------------------------------------------------------------------------------
 Index Scan using test_icu_russian_f1_idx on test_icu_russian 
(cost=0.28..8.29 rows=1 width=6)
   Index Cond: ((f1)::text = '̵'::text)
(2 rows)
(The indexes are indeed used by the above queries.)

Now suppose that the OS is upgraded to Debian 9 (or the pgdata just
moved to Debian 9 with the postgresql 12).
The same queries return:
postgres=# select * from test_icu_ru where f1=chr(821);
WARNING:  collation "ru-RU-x-icu" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION pg_catalog."ru-RU-x-icu" REFRESH VERSION, or build PostgreSQL
with the right library version.
 f1 | i
----+---
(0 rows)

postgres=# select * from test_icu_russian where f1=chr(821);
WARNING:  collation "russian" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public.russian REFRESH VERSION, or build PostgreSQL with the
right library version.
 f1 | i
----+---
(0 rows)

We get no data due to the real collation/sort order change but the
warning says what to do.
The query presented at
https://www.postgresql.org/docs/12/sql-altercollation.html returns:
        Collation        |               Object               
-------------------------+-------------------------------------
 collation "ru-RU-x-icu" | column f1 of table test_icu_ru
 collation "ru-RU-x-icu" | index test_icu_ru_f1_idx
 collation russian       | column f1 of table test_icu_russian
 collation russian       | index test_icu_russian_f1_idx
So the documented behavior is observed.

But after pg_upgrade:
pg_createcluster 12 new
/usr/lib/postgresql/12/bin/pg_upgrade -b /usr/lib/postgresql/12/bin -B
/usr/lib/postgresql/12/bin -d /etc/postgresql/12/main -D
/etc/postgresql/12/new
In the new cluster the same queries return:
postgres=# select * from test_icu_russian where f1=chr(821);
WARNING:  collation "russian" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public.russian REFRESH VERSION, or build PostgreSQL with the
right library version.
 f1 | i
----+---
(0 rows)

postgres=# select * from test_icu_ru where f1=chr(821);
 f1 | i
----+---
(0 rows)
(There is no warning for the predefined collation now.)

The query presented at
https://www.postgresql.org/docs/12/sql-altercollation.html returns:
     Collation     |               Object               
-------------------+-------------------------------------
 collation russian | column f1 of table test_icu_russian
 collation russian | index test_icu_russian_f1_idx
(2 rows)

and
SELECT oid, collname, collnamespace, collprovider, collversion FROM
pg_collation WHERE collname like 'ru%';
returns
  oid  |  collname   | collnamespace | collprovider | collversion
-------+-------------+---------------+--------------+-------------
 12884 | ru-BY-x-icu |            11 | i            | 153.64.29
 12885 | ru-KG-x-icu |            11 | i            | 153.64.29
 12886 | ru-KZ-x-icu |            11 | i            | 153.64.29
 12887 | ru-MD-x-icu |            11 | i            | 153.64.29
 12888 | ru-RU-x-icu |            11 | i            | 153.64.29
 12889 | ru-UA-x-icu |            11 | i            | 153.64.29
 12883 | ru-x-icu    |            11 | i            | 153.64.29
 12329 | ru_RU       |            11 | c            |
 12328 | ru_RU.utf8  |            11 | c            |
 16402 | russian     |          2200 | i            | 58.0.0.50
So only the custom collation' version is actual, but predefined ones
correspond to a newer libicu, but not to actual data.

For all that, REINDEX repairs both indexes:
postgres=# REINDEX INDEX test_icu_russian_f1_idx;
REINDEX
postgres=# select * from test_icu_russian where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# REINDEX INDEX test_icu_ru_f1_idx;
REINDEX
postgres=# select * from test_icu_ru where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

So for now it seems dangerous to use predefined collations as their old
versions are not preserved by pg_upgrade and the user doesn't know which
indexes affected by the actual ICU collation changes.

Best regards,
Alexander

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-11-28 20:25:02 Re: pg_upgrade fails to preserve old versions of the predefined collations
Previous Message Mark Dilger 2019-11-28 18:51:48 Do XID sequences need to be contiguous?