From: | "Tharakan, Robins" <tharar(at)amazon(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | pg_upgrade failing for 200+ million Large Objects |
Date: | 2021-03-03 11:36:26 |
Message-ID: | 12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
While reviewing a failed upgrade from Postgres v9.5 (to v9.6) I saw that the
instance had ~200 million (in-use) Large Objects. I was able to reproduce
this on a test instance which too fails with a similar error.
pg_restore: executing BLOB 4980622
pg_restore: WARNING: database with OID 0 must be vacuumed within 1000001
transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions.
pg_restore: executing BLOB 4980623
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2565; 2613 4980623 BLOB
4980623 postgres
pg_restore: [archiver (db)] could not execute query: ERROR: database is not
accepting commands to avoid wraparound data loss in database with OID 0
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
Command was: SELECT pg_catalog.lo_create('4980623');
To remove the obvious possibilities, these Large Objects that are still
in-use (so vacuumlo wouldn't help), giving more system resources doesn't
help, moving Large Objects around to another database doesn't help (since
this is cluster-wide restriction), the source instance is nowhere close to
wraparound and lastly recent-most minor versions don't help either (I tried
compiling 9_6_STABLE + upgrade database with 150 million LO and still
encountered the same issue).
Do let me know if I am missing something obvious but it appears that this is
happening owing to 2 things coming together:
* Each Large Object is migrated in its own transaction during pg_upgrade
* pg_resetxlog appears to be narrowing the window (available for pg_upgrade)
to ~146 Million XIDs (2^31 - 1 million XID wraparound margin - 2 billion
which is a hard-coded constant - see [1] - in what appears to be an attempt
to force an Autovacuum Wraparound session soon after upgrade completes).
Ideally such an XID based restriction, is limiting for an instance that's
actively using a lot of Large Objects. Besides forcing AutoVacuum Wraparound
logic to kick in soon after, I am unclear what much else it aims to do. What
it does seem to be doing is to block Major Version upgrades if the
pre-upgrade instance has >146 Million Large Objects (half that, if the LO
additionally requires ALTER LARGE OBJECT OWNER TO for each of those objects
during pg_restore)
For long-term these ideas came to mind, although am unsure which are
low-hanging fruits and which outright impossible - For e.g. clubbing
multiple objects in a transaction [2] / Force AutoVacuum post upgrade (and
thus remove this limitation altogether) or see if "pg_resetxlog -x" (from
within pg_upgrade) could help in some way to work-around this limitation.
Is there a short-term recommendation for this scenario?
I can understand a high number of small-sized objects is not a great way to
use pg_largeobject (since Large Objects was intended to be for, well, 'large
objects') but this magic number of Large Objects is now a stalemate at this
point (with respect to v9.5 EOL).
Reference:
1) pg_resetxlog -
https://github.com/postgres/postgres/blob/ca3b37487be333a1d241dab1bbdd17a211
a88f43/src/bin/pg_resetwal/pg_resetwal.c#L444
2)
https://www.postgresql.org/message-id/ed7d86a1-b907-4f53-9f6e-63482d2f2bac%4
0manitou-mail.org
-
Thanks
Robins Tharakan
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-03-03 11:44:44 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | Masahiro Ikeda | 2021-03-03 11:27:29 | Re: About to add WAL write/fsync statistics to pg_stat_wal view |