Mixed Locales and Upgrading

From: Don Seiler <don(at)seiler(dot)us>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Mixed Locales and Upgrading
Date: 2020-03-16 14:24:13
Message-ID: CAHJZqBAS6kuTWZwSbu8KqfhJ8dapvTegyA1QweAHAmmQveUyRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Good morning,

I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12.
Normally I'd just plan to pg_upgrade the lot and be good to go. However
I've found that quite a few (including our biggest/busiest database) have
mixed locales. In the case of the biggest/busiest database, the cluster was
created with locale en_US (NOT en_US.UTF-8), and so the databases have
encoding LATIN1.

However this database has encoding UTF8 while still having ctype and
collation of en_US. I've since found that when this was last upgraded, they
ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where
datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks
when trying to restore this since UTF8 isn't supported in en_US for the CREATE
DATABASE command used during pg_restore:

command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql
--port 50432 --username postgres --create --exit-on-error --verbose
--dbname template1 "pg_upgrade_dump_16385.custom" >>
"pg_upgrade_dump_16385.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR: encoding "UTF8" does
not match locale "en_US"
DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING =
'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';

Is there a way around this while still using pg_upgrade? My understanding
is that a full dump and restore into a new DB with everything set to
en_US.UTF-8 is the only to convert these (if I have to dump/restore, I
wouldn't want to keep the mixed environment). Even with parallel jobs, I
imagine that's a bit of downtime but I'll have to wait until I can get a
copy of prod data to test with to be sure.

Is logical replication an option here? Either maintaining the mixed
environment or converting everything to en_US.UTF-8? I'm relatively new in
this shop but I'm told they didn't mean to use en_US and there's no reason
they wouldn't want to just use the standard/default UTF-8.

Thanks,
Don.

--
Don Seiler
www.seiler.us

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-03-16 14:33:24 Re: Invalid byte sequence errors on DB restore
Previous Message Peter J. Holzer 2020-03-16 13:39:42 Formatting output (was: Order by and timestamp)