From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Assert in heapgettup_pagemode() fails due to underlying buffer change |
Date: | 2024-06-06 10:00:00 |
Message-ID: | 7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
I tried to investigate a recent buildfarm test failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-06-04%2003%3A27%3A47
29/295 postgresql:recovery / recovery/026_overwrite_contrecord ERROR 39.55s exit status 32
log/026_overwrite_contrecord_standby.log
TRAP: failed Assert("ItemIdIsNormal(lpp)"), File: "../pgsql/src/backend/access/heap/heapam.c", Line: 1002, PID: 3740958
postgres: standby: bf postgres [local] startup(ExceptionalCondition+0x81)[0x56c60bf9]
postgres: standby: bf postgres [local] startup(+0xf776e)[0x5667276e]
postgres: standby: bf postgres [local] startup(heap_getnextslot+0x40)[0x56672ee1]
postgres: standby: bf postgres [local] startup(+0x11c218)[0x56697218]
postgres: standby: bf postgres [local] startup(systable_getnext+0xfa)[0x56697c1a]
postgres: standby: bf postgres [local] startup(+0x6d29c7)[0x56c4d9c7]
postgres: standby: bf postgres [local] startup(+0x6d372c)[0x56c4e72c]
postgres: standby: bf postgres [local] startup(+0x6d8288)[0x56c53288]
postgres: standby: bf postgres [local] startup(RelationCacheInitializePhase3+0x149)[0x56c52d71]
(It's not the only failure of that ilk in the buildfarm.)
and managed to reproduce the failure by running many
026_overwrite_contrecord tests in parallel (with fsync=on).
Analyzing the core dump added some info:
...
#3 0x0000000000bb43cc in ExceptionalCondition (conditionName=0xc45c77 "ItemIdIsNormal(lpp)",
fileName=0xc45aa8 "heapam.c", lineNumber=1002) at assert.c:66
#4 0x00000000004f7f13 in heapgettup_pagemode (scan=0x19f5660, dir=ForwardScanDirection, nkeys=2, key=0x19f61d0)
at heapam.c:1002
#5 0x00000000004f86d1 in heap_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0)
at heapam.c:1307
#6 0x000000000051d028 in table_scan_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0)
at ../../../../src/include/access/tableam.h:1081
#7 0x000000000051da80 in systable_getnext (sysscan=0x19f5470) at genam.c:530
#8 0x0000000000ba0937 in RelationBuildTupleDesc (relation=0x7fa004feea88) at relcache.c:572
#9 0x0000000000ba17b9 in RelationBuildDesc (targetRelId=2679, insertIt=true) at relcache.c:1184
#10 0x0000000000ba6520 in load_critical_index (indexoid=2679, heapoid=2610) at relcache.c:4353
#11 0x0000000000ba607d in RelationCacheInitializePhase3 () at relcache.c:4132
#12 0x0000000000bcb704 in InitPostgres (in_dbname=0x196ca30 "postgres", dboid=5, username=0x19a91b8 "law", useroid=0,
flags=1, out_dbname=0x0) at postinit.c:1193
...
(gdb) frame 4
(gdb) p lpp->lp_flags
$2 = 1
(gdb) p ItemIdIsNormal(lpp)
$12 = 1
So it looks like the Assert had failed when lpp->lp_flags had some other
contents...
I added the following debugging code:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -995,10 +995,14 @@ continue_page:
for (; linesleft > 0; linesleft--, lineindex += dir)
{
ItemId lpp;
+ ItemIdData iid;
OffsetNumber lineoff;
lineoff = scan->rs_vistuples[lineindex];
lpp = PageGetItemId(page, lineoff);
+ iid = *((ItemIdData *)lpp);
+
+ Assert(ItemIdIsNormal(&iid));
Assert(ItemIdIsNormal(lpp));
and got:
...
#2 0x000055b68dc6998c in ExceptionalCondition (conditionName=0x55b68dcfe5f7 "ItemIdIsNormal(&iid)",
fileName=0x55b68dcfe428 "heapam.c", lineNumber=1010) at assert.c:66
#3 0x000055b68d588a78 in heapgettup_pagemode (scan=0x55b68f0905e0, dir=ForwardScanDirection, nkeys=2,
key=0x55b68f091150) at heapam.c:1010
#4 0x000055b68d58930e in heap_getnextslot (sscan=0x55b68f0905e0, direction=ForwardScanDirection, slot=0x55b68f090d20)
at heapam.c:1322
...
(gdb) frame 3
#3 0x000055b68d588a78 in heapgettup_pagemode (...) at heapam.c:1010
1010 Assert(ItemIdIsNormal(&iid));
(gdb) info locals
lpp = 0x7f615c34b0ec
iid = {lp_off = 0, lp_flags = 0, lp_len = 0}
lineoff = 54
tuple = 0x55b68f090638
page = 0x7f615c34b000 ""
(gdb) p *lpp
$1 = {lp_off = 3160, lp_flags = 1, lp_len = 136}
It seemingly confirms that the underlying memory was changed while being
processed in heapgettup_pagemode().
I've tried to add checks for the page buffer content as below:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -953,11 +953,15 @@ heapgettup_pagemode(HeapScanDesc scan,
Page page;
int lineindex;
int linesleft;
+char page_copy[BLCKSZ];
if (likely(scan->rs_inited))
{
/* continue from previously returned page/tuple */
page = BufferGetPage(scan->rs_cbuf);
+memcpy(page_copy, page, BLCKSZ);
+for (int i = 0; i < 100; i++)
+Assert(memcmp(page_copy, page, BLCKSZ) == 0);
lineindex = scan->rs_cindex + dir;
if (ScanDirectionIsForward(dir))
@@ -986,6 +990,10 @@ heapgettup_pagemode(HeapScanDesc scan,
/* prune the page and determine visible tuple offsets */
heap_prepare_pagescan((TableScanDesc) scan);
page = BufferGetPage(scan->rs_cbuf);
+memcpy(page_copy, page, BLCKSZ);
+for (int i = 0; i < 100; i++)
+Assert(memcmp(page_copy, page, BLCKSZ) == 0);
+
linesleft = scan->rs_ntuples;
lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1;
and got the assertion failures even during `make check`:
...
#5 0x00005577f29e0bc4 in ExceptionalCondition (
conditionName=conditionName(at)entry=0x5577f2a4a5d0 "memcmp(page_copy, page, BLCKSZ) == 0",
fileName=fileName(at)entry=0x5577f2a4aa38 "heapam.c", lineNumber=lineNumber(at)entry=966) at assert.c:66
#6 0x00005577f24faa68 in heapgettup_pagemode (scan=scan(at)entry=0x5577f46574e8, dir=ForwardScanDirection, nkeys=0,
key=0x0) at heapam.c:966
...
(gdb) frame 6
#6 0x00005577f24faa68 in heapgettup_pagemode (...) at heapam.c:966
966 Assert(memcmp(page_copy, page, BLCKSZ) == 0);
(gdb) p i
$1 = 25
Am I missing something or the the page buffer indeed lacks locking there?
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-06-06 10:13:33 | Re: Conflict Detection and Resolution |
Previous Message | Anthonin Bonnefoy | 2024-06-06 09:52:59 | Re: Remove dependency on VacuumPage(Hit/Miss/Dirty) counters in do_analyze_rel |