Re: ERROR: unexpected chunk number 452 (expected 0) for toast value 94674063 in pg_toast_56980977

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Naveen Kumar <naveenchowdaryon(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: ERROR: unexpected chunk number 452 (expected 0) for toast value 94674063 in pg_toast_56980977
Date: 2018-07-18 19:58:50
Message-ID: 20180718195850.axhzokgbggoxpcmr@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 2018-Jul-18, Naveen Kumar wrote:

> Hello Experts,
>
> What could be the reason for bellow errors. Can one help me to fix this?
>
> PostgreSQL Database: [~rptdb~] Data Backup Failed with PostgreSQL Error:
> [~pg_dump: Dumping the contents of table "document" failed: PQgetResult()
> failed.pg_dump: Error message from server: ERROR: unexpected chunk number
> 452 (expected 0) for toast value 94674063 in pg_toast_56980977pg_dump: The
> command was: COPY reports_extended.document (document_id, access_key,
> created_date, document_name, document_size, document_status, document_type,
> encryption_type, external_system_storage_id, external_system_storage_url,
> last_updated_date, md5_hash, source_system_name, storage_type, created_by,
> customer_org_id, content_type, tags, file, ownedby) TO stdout;~].

There are two related bugfixes in 9.6.9 (and all other versions released
with it). Maybe you'd do well to upgrade to the latest minor of
whatever supported branch you're using, and then let's discuss more. If
you were victim to this bug I'm not sure there's direct mitigation, or
you'll need to UPDATE the offending rows to set the columns to null (or
some other value). Eight years ago (!) I wrote a function to scan for
those, quoted in this blog post (sorry about my Spanish):
https://alvherre.livejournal.com/4404.html
May be helpful. (I think it needs a minor fix to run in current
releases.)

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL_11_BR [d1e907929] 2018-04-11 18:11:29 -0400
Branch: REL_10_STABLE Release: REL_10_4 [08e6cda1c] 2018-04-11 18:11:29 -0400
Branch: REL9_6_STABLE Release: REL9_6_9 [060bb38d0] 2018-04-11 18:11:30 -0400
Branch: REL9_5_STABLE Release: REL9_5_13 [efbe36a2c] 2018-04-11 18:11:30 -0400
Branch: REL9_4_STABLE Release: REL9_4_18 [6943fb927] 2018-04-11 18:11:30 -0400
Branch: REL9_3_STABLE Release: REL9_3_23 [66d4b6bb8] 2018-04-11 18:11:30 -0400

Ignore nextOid when replaying an ONLINE checkpoint.

The nextOid value is from the start of the checkpoint and may well be stale
compared to values from more recent XLOG_NEXTOID records. Previously, we
adopted it anyway, allowing the OID counter to go backwards during a crash.
While this should be harmless, it contributed to the severity of the bug
fixed in commit 0408e1ed5, by allowing duplicate TOAST OIDs to be assigned
immediately following a crash. Without this error, that issue would only
have arisen when TOAST objects just younger than a multiple of 2^32 OIDs
were deleted and then not vacuumed in time to avoid a conflict.

Pavan Deolasee

Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL_11_BR [0408e1ed5] 2018-04-11 17:41:22 -0400
Branch: REL_10_STABLE Release: REL_10_4 [5a11bf970] 2018-04-11 17:41:23 -0400
Branch: REL9_6_STABLE Release: REL9_6_9 [8bba10f7e] 2018-04-11 17:41:25 -0400
Branch: REL9_5_STABLE Release: REL9_5_13 [3767216fb] 2018-04-11 17:41:26 -0400
Branch: REL9_4_STABLE Release: REL9_4_18 [5b3ed6b78] 2018-04-11 17:41:27 -0400
Branch: REL9_3_STABLE Release: REL9_3_23 [7448e7e23] 2018-04-11 17:41:28 -0400

Do not select new object OIDs that match recently-dead entries.

When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:

* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.

* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.

The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.

Pavan Deolasee

Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2018-07-18 20:40:05 Re: ERROR: unexpected chunk number 452 (expected 0) for toast value 94674063 in pg_toast_56980977
Previous Message David G. Johnston 2018-07-18 17:10:21 Re: find objects stored on a specific tablespace