From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17245: Index corruption involving deduplicated entries |
Date: | 2021-10-29 01:19:23 |
Message-ID: | 20211029011923.utmolntkasenzreh@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
It's not the cause of this problem, but I did find a minor issue: the retry
path in lazy_scan_prune() looses track of the deleted tuple count when
retrying.
The retry codepath also made me wonder if there could be problems if we do
FreezeMultiXactId() multiple times due to retry. I think we can end up
creating multiple multixactids for the same tuple (if the members change,
which is likely in the retry path). But that should be fine, I think.
On 2021-10-28 16:04:44 -0700, Peter Geoghegan wrote:
> > Didn't 14 change the logic when index vacuums are done? That could cause
> > previously existing issues to manifest with a higher likelihood.
>
> I don't follow. The new logic that skips index vacuuming kicks in 1)
> in an anti-wraparound vacuum emergency, and 2) when there are very few
> LP_DEAD line pointers in the heap. We can rule 1 out, I think, because
> the XIDs we see are in the low millions, and our starting point was a
> database that was upgraded via a dump and reload.
Right.
> The second criteria for skipping index vacuuming (the "less than 2% of
> heap pages have any LP_DEAD items" thing) might well have been hit on
> these tables -- it is after all very common. But I don't see how that
> could matter. We're never going to get to a code path inside
> vacuumlazy.c that sets LP_DEAD items from VACUUM's dead_tuples array
> to LP_UNUSED (how could reached such a code path without also index
> vacuuming, given the way things are set up inside lazy_vacuum()?).
> We're always going to have the opportunity to do index vacuuming with
> any left-behind LP_DEAD line pointers in the next VACUUM -- right
> after the later VACUUM successfully returns from
> lazy_vacuum_all_indexes().
Shrug. It doesn't seem that hard to believe that repeatedly trying to prune
the same page could unearth some bugs. E.g. via the heap_prune_record_unused()
path in heap_prune_chain().
Hm. I assume somebody checked and verified that old_snapshot_threshold is not
in use? Seems unlikely, but wrongly entering that heap_prune_record_unused()
path could certainly cause issues like we're observing.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2021-10-29 01:27:35 | BUG #17253: Composite partition table configuration error |
Previous Message | Thomas Munro | 2021-10-28 23:57:43 | Re: BUG #17245: Index corruption involving deduplicated entries |