Re: Changing collate & ctype for an existing database

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: 'rihad' <rihad(at)mail(dot)ru>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Changing collate & ctype for an existing database
Date: 2017-07-12 09:54:43
Message-ID: A737B7A37273E048B164557ADEF4A58B53A81CA2@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

rihad wrote:
> Hi there. We have a working database that was unfortunately created by
> initdb with default ("C") collation & ctype. All other locale specific
> settings have the value en_US.UTF-8 in postgresql.conf. The database
> itself is multilingual and all its data is stored in UTF-8. Sorting
> doesn't work correctly, though. To fix that, can I just do this:
>
>
> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
> where datname='mydb';
>
>
> This does seem to work on a testing copy of the database, i.e. select
> lower('БлаБлаБла') now works correctly when connected to that database.
>
>
> Is there still any chance for corrupting data by doing this, or indexes
> stopping working etc?
>
> p.s.: postgres 9.6.3

As explained, yes. Indexes on string columns will be corrupted.

See this example:

test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
test=# \c breakme
breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
breakme=# CREATE INDEX ON sort(val);
breakme=# SET enable_seqscan=off; -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 1 │ LITTLE │
│ 3 │ b-less │
│ 2 │ big │
└────┴────────┘
(3 rows)

breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='breakme';
breakme=# \c breakme
breakme=# SET enable_seqscan=off; -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 1 │ LITTLE │
│ 3 │ b-less │
│ 2 │ big │
└────┴────────┘
(3 rows)

breakme=# SET enable_seqscan=on; -- this and the following force sequential scan
breakme=# SET enable_bitmapscan=off;
breakme=# SET enable_indexscan=off;
breakme=# SET enable_indexonlyscan=off;
breakme=# SELECT * FROM sort ORDER BY val; -- this returns the correct order
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 2 │ big │
│ 3 │ b-less │
│ 1 │ LITTLE │
└────┴────────┘
(3 rows)

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message dpat 2017-07-12 13:08:31 Manage slot in logical/pglogical replication
Previous Message Albe Laurenz 2017-07-12 09:31:22 Re: loading file with en dash character into postgres 9.6.1 database