From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | 'rihad' <rihad(at)mail(dot)ru>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Changing collate & ctype for an existing database |
Date: | 2017-07-12 09:54:43 |
Message-ID: | A737B7A37273E048B164557ADEF4A58B53A81CA2@ntex2010i.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
rihad wrote:
> Hi there. We have a working database that was unfortunately created by
> initdb with default ("C") collation & ctype. All other locale specific
> settings have the value en_US.UTF-8 in postgresql.conf. The database
> itself is multilingual and all its data is stored in UTF-8. Sorting
> doesn't work correctly, though. To fix that, can I just do this:
>
>
> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
> where datname='mydb';
>
>
> This does seem to work on a testing copy of the database, i.e. select
> lower('БлаБлаБла') now works correctly when connected to that database.
>
>
> Is there still any chance for corrupting data by doing this, or indexes
> stopping working etc?
>
> p.s.: postgres 9.6.3
As explained, yes. Indexes on string columns will be corrupted.
See this example:
test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
test=# \c breakme
breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
breakme=# CREATE INDEX ON sort(val);
breakme=# SET enable_seqscan=off; -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 1 │ LITTLE │
│ 3 │ b-less │
│ 2 │ big │
└────┴────────┘
(3 rows)
breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='breakme';
breakme=# \c breakme
breakme=# SET enable_seqscan=off; -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 1 │ LITTLE │
│ 3 │ b-less │
│ 2 │ big │
└────┴────────┘
(3 rows)
breakme=# SET enable_seqscan=on; -- this and the following force sequential scan
breakme=# SET enable_bitmapscan=off;
breakme=# SET enable_indexscan=off;
breakme=# SET enable_indexonlyscan=off;
breakme=# SELECT * FROM sort ORDER BY val; -- this returns the correct order
┌────┬────────┐
│ id │ val │
├────┼────────┤
│ 2 │ big │
│ 3 │ b-less │
│ 1 │ LITTLE │
└────┴────────┘
(3 rows)
As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | dpat | 2017-07-12 13:08:31 | Manage slot in logical/pglogical replication |
Previous Message | Albe Laurenz | 2017-07-12 09:31:22 | Re: loading file with en dash character into postgres 9.6.1 database |