Re: BUG #18351: VACUUM FULL fails with error: missing chunk number 0 for toast value XXX

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18351: VACUUM FULL fails with error: missing chunk number 0 for toast value XXX
Date: 2024-02-27 09:00:01
Message-ID: 80abbbf0-588b-f808-c264-77c3ebee83f3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

17.02.2024 23:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 18351
>
> The following script:
> ...
> VACUUM tbl;
> VACUUM FULL tbl;
> " | psql -d db$c >psql-2-$c.log 2>&1 &
> done
> wait
> grep 'missing chunk number' server.log && break;
> done
>

> So it looks like recently dead tuples might have no correspondent toast
> values to detoast attributes of such tuples.
>
> Reproduced on REL_12_STABLE .. master.
> Probably, the same issue was reported some time ago:
> https://www.postgresql.org/message-id/flat/CALdSSPhmqoN02ciT4UxS6ax0N84NpRwPWm87nKJ_%2B0G-Na8qOQ%40mail.gmail.com

I've discovered the old bug #5998
https://www.postgresql.org/message-id/flat/201104291506.p3TF6dA9003698%40wwwmaster.postgresql.org
with similar symptoms and a very relevant question:
> So, how is it that the vacuum removed the underlying toast data but not
> the putatively-recently-dead tuple?

In fact, the script presented here, produces the error even on 83b758494,
so this anomaly is pretty old.
But for the given case, the answer to that question is different.
With additional debug and verbose logging (on REL_13_STABLE, as
lazy_scan_heap() was more flat back then) I see:

2024-02-27 08:22:04.859 UTC|law|db9|65dd9bac.12914b|LOG:  statement: VACUUM (VERBOSE) tbl;

2024-02-27 08:22:04.859 UTC|law|db9|65dd9bac.12914b|LOG: !!!vacuum_rel: relid: 16980, snapshot->xmin: 845,
snapshot->xmax: 865

2024-02-27 08:22:04.859 UTC|law|db9|65dd9bac.12914b|LOG: !!!lazy_scan_heap| relname: tbl, blkno: 0, offnum: 1,
ItemIdIsDead(itemid): 0

2024-02-27 08:22:04.859 UTC|law|db9|65dd9bac.12914b|LOG: !!!lazy_scan_heap| relname: tbl, blkno: 0, offnum: 1,
tuple.t_data->t_choice.t_heap.t_xmin: 859, tuple.t_data->t_choice.t_heap.t_xmax: 864, OldestXmin: 845,
HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf): 2
### The tbl's tuple is RECENTLY_DEAD ###

2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|INFO:  "tbl": found 0 removable, 1 nonremovable row versions in 1
out of 1 pages
2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|DETAIL:  1 dead row versions cannot be removed yet, oldest xmin: 845
    There were 0 unused item identifiers.
    Skipped 0 pages due to buffer pins, 0 frozen pages.
    0 pages are entirely empty.
    CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|LOG: !!!lazy_scan_heap| relname: pg_toast_16980, blkno: 0, offnum:
1, ItemIdIsDead(itemid): 1
2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|CONTEXT:  while scanning block 0 of relation "pg_toast.pg_toast_16980"

2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|LOG: !!!lazy_scan_heap| relname: pg_toast_16980, blkno: 0, offnum:
2, ItemIdIsDead(itemid): 1
2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|CONTEXT:  while scanning block 0 of relation "pg_toast.pg_toast_16980"
### But the toast's tuples are DEAD according to their lp_flags ###

2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|INFO: "pg_toast_16980": found 2 removable, 0 nonremovable row
versions in 1 out of 1 pages
2024-02-27 08:22:04.860 UTC|law|db9|65dd9bac.12914b|DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 866
    There were 0 unused item identifiers.
    Skipped 0 pages due to buffer pins, 0 frozen pages.
    0 pages are entirely empty.
    CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

2024-02-27 08:22:04.861 UTC|law|db9|65dd9bac.12914b|STATEMENT: VACUUM FULL tbl;

2024-02-27 08:22:04.862 UTC|law|db9|65dd9bac.12914b|ERROR:  missing chunk number 0 for toast value 17045 in pg_toast_16980

So in this case, the heap tuple and the pg_toast_xxx tuples are out-of-sync
in regard to their DEAD status after DELETE, then VACUUM removes dead
tuples from pg_toast_xxx, and the following VACUUM FULL
(heapam_relation_copy_for_cluster()) fails to copy the heap tuple in
absence of corresponding toast tuples.

Best regards,
Alexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message jian he 2024-02-27 10:17:28 Re: BUG #18314: PARALLEL UNSAFE function does not prevent parallel index build
Previous Message Matti Aarnio 2024-02-27 08:12:45 Re: BUG #18366: Not present at postgresql.org YUM repository