Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-12-11 04:58:26
Message-ID: 20211211045826.bmtdqxn2xuk5l4yl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-13 16:06:40 +0100, Dmitry Dolgov wrote:
> I've got curious if modifying the Alexander's test case could reveal
> something interesting, and sprinkled it with savepoints and rollbacks.
> Almost immediately a new problem has manifested itself, although the
> crash has nothing to do with the disconnected tuples as far as I can
> tell -- still probably worth mentioning. In this case vacuum invoked
> lazy_scan_prune, and during the first scan one of the chains had a
> HEAPTUPLE_DEAD at the third position. The processing flow fell through
> to heap_prune_record_prunable and crashed on an assert with an
> InvalidTransactionId:
>
> #3 0x000055a2b260d1f9 in heap_prune_record_prunable (prstate=0x7ffd0c0ecdf0, xid=0) at pruneheap.c:872
> #4 0x000055a2b260ca72 in heap_prune_chain (buffer=2117, rootoffnum=150, prstate=0x7ffd0c0ecdf0) at pruneheap.c:695
> #5 0x000055a2b260bcd6 in heap_page_prune (relation=0x7fb98e217e20, buffer=2117, vistest=0x55a2b31d2d60 <GlobalVisCatalogRels>, old_snap_xmin=0, old_snap_ts=0, report_stats=false, off_loc=0x55a2b3e6a0cc) at pruneheap.c:288
> #6 0x000055a2b261309c in lazy_scan_prune (vacrel=0x55a2b3e6a060, buf=2117, blkno=192, page=0x7fb97856bf80 "", vistest=0x55a2b31d2d60 <GlobalVisCatalogRels>, prunestate=0x7ffd0c0ee9d0) at vacuumlazy.c:1739
>
> Applying heap_prune_record_prunable only if TransactionIdIsNormal seems
> to help. The original implementation didn't reach
> heap_prune_record_prunable either and also doesn't crash.

Does your modified test still find problems with 0001 & 0002 from
https://postgr.es/m/20211211045710.ljtuu4gfloh754rs%40alap3.anarazel.de
applied?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-12-11 05:48:22 Re: BUG #17321: count(*) on a 1,874,554,883 rows partitioned table takes several minutes.
Previous Message Andres Freund 2021-12-11 04:57:10 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum