From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Date: | 2021-11-13 15:06:40 |
Message-ID: | 20211113150640.vk5zhjangylufxaa@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> On Fri, Nov 12, 2021 at 02:46:22PM -0800, Peter Geoghegan wrote:
> On Fri, Nov 12, 2021 at 2:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > Naturally, I also went through the exercise of trying to find a
> > > counterexample, where pruning doesn't see a disconnected tuple as DEAD
> > > in its HTSV. I could not get the assertion to fail with Alexander's
> > > test case, nor with make check-world.
> >
> > I don't think that provides a meaningful coverage. Alexander's test has a
> > quite limited set operations (which e.g. doesn't include an subxacts), and our
> > own tests around subtransactions, and particularly concurrent subtransaction
> > heavy work, is quite, uh, minimal.
>
> It's a start.
>
> We need to be pragmatic here. There is some uncertainty about what
> HTSV might say about a disconnected tuple in the absence of
> corruption, or there is a risk of a new problem like that coming up in
> the future -- let's work within those confines, then. What do you want
> to do about that? There aren't that many choices, since, to repeat,
> the tuple is "morally" DEAD no matter what. Even with corruption, even
> without corruption in the presence of some unanticipated corner case
> with HTSV -- this is fundamental.
I've got curious if modifying the Alexander's test case could reveal
something interesting, and sprinkled it with savepoints and rollbacks.
Almost immediately a new problem has manifested itself, although the
crash has nothing to do with the disconnected tuples as far as I can
tell -- still probably worth mentioning. In this case vacuum invoked
lazy_scan_prune, and during the first scan one of the chains had a
HEAPTUPLE_DEAD at the third position. The processing flow fell through
to heap_prune_record_prunable and crashed on an assert with an
InvalidTransactionId:
#3 0x000055a2b260d1f9 in heap_prune_record_prunable (prstate=0x7ffd0c0ecdf0, xid=0) at pruneheap.c:872
#4 0x000055a2b260ca72 in heap_prune_chain (buffer=2117, rootoffnum=150, prstate=0x7ffd0c0ecdf0) at pruneheap.c:695
#5 0x000055a2b260bcd6 in heap_page_prune (relation=0x7fb98e217e20, buffer=2117, vistest=0x55a2b31d2d60 <GlobalVisCatalogRels>, old_snap_xmin=0, old_snap_ts=0, report_stats=false, off_loc=0x55a2b3e6a0cc) at pruneheap.c:288
#6 0x000055a2b261309c in lazy_scan_prune (vacrel=0x55a2b3e6a060, buf=2117, blkno=192, page=0x7fb97856bf80 "", vistest=0x55a2b31d2d60 <GlobalVisCatalogRels>, prunestate=0x7ffd0c0ee9d0) at vacuumlazy.c:1739
Applying heap_prune_record_prunable only if TransactionIdIsNormal seems
to help. The original implementation didn't reach
heap_prune_record_prunable either and also doesn't crash.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-11-13 17:06:48 | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Previous Message | PG Bug reporting form | 2021-11-13 12:00:01 | BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged |