Re: Ways to change a database collation with removing duplicates

From: Alexey Murz Korepov <murznn(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erik Wienhold <ewie(at)ewie(dot)name>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Ways to change a database collation with removing duplicates
Date: 2023-03-12 05:52:00
Message-ID: CAL5pyKt6fW7wErKUf0ev7i6+=rszdR3Qo4rxbOH9kjCiwo9F2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks for the explanation! Yes, it seems this is not related to
collations, because I see an integer duplicate there now, thanks for
pointing to this. Maybe the restoration of the database is started twice
somehow, that produces that duplicates, so I will try to restart the
migration process from scratch.

On Sun, Mar 12, 2023 at 12:53 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Erik Wienhold <ewie(at)ewie(dot)name> writes:
> > On 11/03/2023 19:23 CET Alexey Murz Korepov <murznn(at)gmail(dot)com> wrote:
> >> Could anyone suggest to me the ways to change a database collation with
> >> removing all the duplicates, caused by this change?
>
> > Collations can only affect uniqueness if they are nondeterministic or if
> you
> > have functional indexes, e.g. using lower(text) for a case-insensitive
> unique
> > index. Otherwise the collations only affect text ordering.
>
> Yeah. I suspect that what actually happened here was a previous change in
> the host system's sort ordering (cf [1]), leading to text indexes becoming
> functionally corrupt and unable to enforce uniqueness correctly, after
> which you accumulated some unintentional duplicates. If you try
> reindex'ing on the source database you'll probably find that it fails with
> the same errors. I don't know of any automatic tools for fixing up such
> duplications, and wouldn't trust one hugely anyway --- you'll probably
> need manual curation of the fixes.
>
> regards, tom lane
>
> [1] https://wiki.postgresql.org/wiki/Locale_data_changes
>

--
Best regards,
Alexey Murz Korepov.
E-mail: murznn(at)gmail(dot)com
Messengers: Matrix - https://matrix.to/#/@murz:ru-matrix.org Telegram -
@MurzNN

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2023-03-12 13:47:18 Re: Tooling for per table autovacuum tuning
Previous Message Erik Wienhold 2023-03-11 21:23:11 Re: LOG: could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat"