Re: Mixed Locales and Upgrading

From: Don Seiler <don(at)seiler(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mixed Locales and Upgrading
Date: 2020-03-17 13:06:52
Message-ID: CAHJZqBD0tv8VswMA7o=8L4DwVKWVxyx9cCv-xzr6-ZQke3GZ8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Mar 16, 2020 at 10:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> Egad.
>

My thoughts exactly.

> Well, in principle you could likewise manually update pg_database's
> datcollate and datctype columns to say "en_US.utf8". However, there's
> a much bigger problem here --- what steps if any did this cowboy take
> to ensure that the data inside the database was valid UTF8?
>

No steps that I've seen from the chat history I've been able to search. I'm
not sure if there was an (invalid) assumption that LATIN1 is a subset of
UTF-8 or if it was done in a panic to get the import/update working years
ago.

> I don't think you should use pg_upgrade here at all. A dump/restore
> is really the only way to make sure that you have validly encoded data.
>

That is what I thought, and probably not what they'll want to hear given
the downtime involved. Even with parallel dump/restore jobs, I imagine it
will take quite a while (this first DB is almost 900GB).

> However, if it's only one database out of a bunch, you could do something
> like
>
> * pg_dump that one database;
> * drop said database;
> * pg_upgrade everything else;
> * restore that one database from dump.
>

In the case of this busy cluster, the layout is like this:

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+----------+----------+------------+------------+-----------------------
foooo_all | postgres | UTF8 | en_US | en_US |
postgres | postgres | LATIN1 | en_US | en_US | =Tc/postgres
+
| | | | |
postgres=CTc/postgres
template0 | postgres | LATIN1 | en_US | en_US | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
(4 rows)

So the template1 DB was dropped and recreated with the collate and ctype as
well and then set to be a template again. But I believe that was well after
foooo_all was changed (and so probably no need for the template1 change).
In this case, if this is the only DB in the cluster, would it make sense to
just create a new one as en_US.UTF-8 and then restore a dump of foooo_all
into a pre-created en_US.UTF-8 DB?

We have a few other mixed environments similar to this as well. Some have
postgres and both template DBs with this same UTF8/en_US/en_US
configuration.

Is logical replication an option here? If the target DB were setup as
en_US.UTF-8 across the board, would logical replication safely replicate
and convert the data until we could then cut over?

Thanks,
Don.
--
Don Seiler
www.seiler.us

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Don Seiler 2020-03-17 13:45:41 Re: Mixed Locales and Upgrading
Previous Message Paul Foerster 2020-03-17 10:37:02 Re: Automatic failover