Re: Collation versioning

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2018-09-16 08:19:37
Message-ID: CAEepm=33gHF92UD8bkoZzzbs+GkVhHFzgV4CPf8ZxFRqZ_UCHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 13, 2018 at 7:03 PM Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 12/09/2018 13:25, Christoph Berg wrote:
> > Re: Peter Eisentraut 2018-09-12 <0447ec7b-cdb6-7252-7943-88a4664e7bb7(at)2ndquadrant(dot)com>
> >>> Naive idea: make that catalog shared? Collations are system-wide after
> >>> all.
> >>
> >> By the same argument, extensions should be shared, but they are not.
> >
> > But extensions put a lot of visible stuff into a database, whereas a
> > collation is just a line in some table that doesn't get into the way.
>
> How about C functions? They are just a system catalog representation of
> something that exists on the OS.
>
> Anyway, we also want to support application-specific collation
> definitions, so that users can CREATE COLLATION
> "my_specific_requirements" and use that that in their application, so
> global collations wouldn't be appropriate for that.
>
> Moreover, the fix for a collation version mismatch is, in the simplest
> case, to go around and REINDEX everything. Making the collation or
> collation version global doesn't fix that. It would actually make it
> harder because you couldn't run ALTER COLLATION REFRESH VERSION until
> after you have rebuilt all affected objects *in all databases*.

Here's one idea I came up with. It involves a new kind of magic. The
goals are:

1. Support versioning for the libc provider, including for the
default collation.
2. Support ICU for the default collation.
3. Fix the tracking of when reindexes need to be rebuilt, so that you
can't get it wrong (as you're alluding to above).

Changes:

1. Drop the datcollate and datctype columns from pg_database.
2. In CheckMyDatabase() or elsewhere in backend initialisation, get
that information instead by loading the pg_collation row with OID =
DEFAULT_COLLATION_OID.
3. Don't put COLLPROVIDER_DEFAULT into the default collation
collprovider column, instead give it a concrete provider value, ie
COLLPROVIDER_LIBC.
4. After creating a new database, update that row as appropriate in
the new database (!). Or find some other way to write a new table out
and switch it around, or something like that. That is, if you say
CREATE DATABASE foo LC_COLLATE = 'xx_XX', COLLATION_PROVIDER = libc
then those values somehow get written into the default pg_collation
row in the *new* database (so at that point it's not a simple copy of
the template database).
5. Drop the collversion column from pg_collation. Get rid of the
REFRESH VERSION command. Instead, add a new column indcollversion to
pg_index. It needs to be an array of text (not sure if that is a
problem in a catalog), with elements that correspond to the elements
of indcollation.
6. Do the check and log warnings when we first open each index.
7. Update indcollversion at index creation and whenever we REINDEX.

I haven't actually tried any of this so I'm not sure if I'm missing
something other than the inherent difficulty of updating a row in a
table in a database you're not connected to...

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2018-09-16 10:38:41 Re: Code of Conduct plan
Previous Message Christoph Berg 2018-09-16 07:48:34 Re: [patch] Support LLVM 7