From: | Don Seiler <don(at)seiler(dot)us> |
---|---|
To: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Mixed Locales and Upgrading |
Date: | 2020-03-16 14:24:13 |
Message-ID: | CAHJZqBAS6kuTWZwSbu8KqfhJ8dapvTegyA1QweAHAmmQveUyRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Good morning,
I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12.
Normally I'd just plan to pg_upgrade the lot and be good to go. However
I've found that quite a few (including our biggest/busiest database) have
mixed locales. In the case of the biggest/busiest database, the cluster was
created with locale en_US (NOT en_US.UTF-8), and so the databases have
encoding LATIN1.
However this database has encoding UTF8 while still having ctype and
collation of en_US. I've since found that when this was last upgraded, they
ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where
datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks
when trying to restore this since UTF8 isn't supported in en_US for the CREATE
DATABASE command used during pg_restore:
command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql
--port 50432 --username postgres --create --exit-on-error --verbose
--dbname template1 "pg_upgrade_dump_16385.custom" >>
"pg_upgrade_dump_16385.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR: encoding "UTF8" does
not match locale "en_US"
DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING =
'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';
Is there a way around this while still using pg_upgrade? My understanding
is that a full dump and restore into a new DB with everything set to
en_US.UTF-8 is the only to convert these (if I have to dump/restore, I
wouldn't want to keep the mixed environment). Even with parallel jobs, I
imagine that's a bit of downtime but I'll have to wait until I can get a
copy of prod data to test with to be sure.
Is logical replication an option here? Either maintaining the mixed
environment or converting everything to en_US.UTF-8? I'm relatively new in
this shop but I'm told they didn't mean to use en_US and there's no reason
they wouldn't want to just use the standard/default UTF-8.
Thanks,
Don.
--
Don Seiler
www.seiler.us
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2020-03-16 14:33:24 | Re: Invalid byte sequence errors on DB restore |
Previous Message | Peter J. Holzer | 2020-03-16 13:39:42 | Formatting output (was: Order by and timestamp) |