Ways to change a database collation with removing duplicates

From: Alexey Murz Korepov <murznn(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Ways to change a database collation with removing duplicates
Date: 2023-03-11 18:23:43
Message-ID: CAL5pyKvXL69bRrONgkutz=xeCUQz6QO7GN_JndqBD0Mq5OJJ-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Could anyone suggest to me the ways to change a database collation with
removing all the duplicates, caused by this change?

I have a pretty large database (around 500 Gb) that was created with
`en_US.UTF-8` collation, but the new version of the application requires
that the collation should be strictly `C`.

I can successfully create a dump of the old database using `pgdump`.

But when I'm importing the dump to the new database with `COLLATE=C`, I see
a lot of errors on ALTER TABLE when creating primary keys, and the same -
for `CREATE INDEX` commands:

ALTER TABLE
ERROR: could not create unique index "access_tokens_pkey"
DETAIL: Key (id)=(16734) is duplicated.
ERROR: could not create unique index "access_tokens_token_key"
DETAIL: Key (token)=(XXX) is duplicated.
CONTEXT: parallel worker
ERROR: could not create unique index "account_data_uniqueness"
DETAIL: Key (user_id, account_data_type)=(@username:XXX,
im.vector.setting.breadcrumbs) is duplicated.

CREATE INDEX
ERROR: could not create unique index "e2e_cross_signing_keys_idx"
DETAIL: Key (user_id, keytype,
stream_id)=(@-----------------:matrix.org, master, 1606172) is
duplicated.
ERROR: could not create unique index "e2e_cross_signing_keys_stream_idx"
DETAIL: Key (stream_id)=(1779009) is duplicated.

So, could anyone give some advice on how to perform the collation change
with cleaning out all the duplicates? Thanks!

--
Best regards,
Alexey Murz Korepov.
E-mail: murznn(at)gmail(dot)com
Messengers: Matrix - https://matrix.to/#/@murz:ru-matrix.org Telegram -
@MurzNN

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Joseph Hammerman 2023-03-11 18:49:10 Tooling for per table autovacuum tuning
Previous Message Holger Jakobs 2023-03-10 18:06:45 Re: LOG: could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat"