Re: longfin and tamandua aren't too happy but I'm not sure why

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: longfin and tamandua aren't too happy but I'm not sure why
Date: 2022-09-27 20:35:17
Message-ID: 3825454.1664310917@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote:
>> Both animals are running with -fsanitize=alignment and it's not
>> difficult to believe that the commit mentioned above could have
>> introduced an alignment problem where we didn't have one before, but
>> without a stack backtrace I don't know how to track it down. I tried
>> running those tests locally with -fsanitize=alignment and they passed.

> There's one here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-09-27%2018%3A43%3A06

On longfin's host, the test_decoding run produces two core files.
One has a backtrace like this:

* frame #0: 0x000000010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x00007fa0678a8090, parsed=0x00007ff7b5c50e78) at xactdesc.c:102:30
frame #1: 0x000000010a765f9e postgres`xact_decode(ctx=0x00007fa0680d9118, buf=0x00007ff7b5c51000) at decode.c:201:5 [opt]
frame #2: 0x000000010a765d17 postgres`LogicalDecodingProcessRecord(ctx=0x00007fa0680d9118, record=<unavailable>) at decode.c:119:3 [opt]
frame #3: 0x000000010a76d890 postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true, binary=false) at logicalfuncs.c:271:5 [opt]
frame #4: 0x000000010a76d320 postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at logicalfuncs.c:338:9 [opt]
frame #5: 0x000000010a5a521d postgres`ExecMakeTableFunctionResult(setexpr=<unavailable>, econtext=0x00007fa068098f50, argContext=<unavailable>, expectedDesc=0x00007fa06701ba38, randomAccess=<unavailable>) at execSRF.c:234:13 [opt]
frame #6: 0x000000010a5c405b postgres`FunctionNext(node=0x00007fa068098d40) at nodeFunctionscan.c:95:5 [opt]
frame #7: 0x000000010a5a61b9 postgres`ExecScan(node=0x00007fa068098d40, accessMtd=(postgres`FunctionNext at nodeFunctionscan.c:61), recheckMtd=(postgres`FunctionRecheck at nodeFunctionscan.c:251)) at execScan.c:199:10 [opt]
frame #8: 0x000000010a596ee0 postgres`standard_ExecutorRun [inlined] ExecProcNode(node=0x00007fa068098d40) at executor.h:259:9 [opt]
frame #9: 0x000000010a596eb8 postgres`standard_ExecutorRun [inlined] ExecutePlan(estate=<unavailable>, planstate=0x00007fa068098d40, use_parallel_mode=<unavailable>, operation=CMD_SELECT, sendTuples=<unavailable>, numberTuples=0, direction=1745456112, dest=0x00007fa067023848, execute_once=<unavailable>) at execMain.c:1636:10 [opt]
frame #10: 0x000000010a596e2a postgres`standard_ExecutorRun(queryDesc=<unavailable>, direction=1745456112, count=0, execute_once=<unavailable>) at execMain.c:363:3 [opt]

and the other

* frame #0: 0x000000010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x00007fa06783a090, parsed=0x00007ff7b5c50040) at xactdesc.c:102:30
frame #1: 0x000000010a3cd24d postgres`xact_redo(record=0x00007fa0670096c8) at xact.c:6161:3
frame #2: 0x000000010a41770d postgres`ApplyWalRecord(xlogreader=0x00007fa0670096c8, record=0x00007fa06783a060, replayTLI=0x00007ff7b5c507f0) at xlogrecovery.c:1897:2
frame #3: 0x000000010a4154be postgres`PerformWalRecovery at xlogrecovery.c:1728:4
frame #4: 0x000000010a3e0dc7 postgres`StartupXLOG at xlog.c:5473:3
frame #5: 0x000000010a7498a0 postgres`StartupProcessMain at startup.c:267:2 [opt]
frame #6: 0x000000010a73e2cb postgres`AuxiliaryProcessMain(auxtype=StartupProcess) at auxprocess.c:141:4 [opt]
frame #7: 0x000000010a745b97 postgres`StartChildProcess(type=StartupProcess) at postmaster.c:5408:3 [opt]
frame #8: 0x000000010a7487e2 postgres`PostmasterStateMachine at postmaster.c:4006:16 [opt]
frame #9: 0x000000010a745804 postgres`reaper(postgres_signal_arg=<unavailable>) at postmaster.c:3256:2 [opt]
frame #10: 0x00007ff815b16dfd libsystem_platform.dylib`_sigtramp + 29
frame #11: 0x00007ff815accd5b libsystem_kernel.dylib`__select + 11
frame #12: 0x000000010a74689c postgres`ServerLoop at postmaster.c:1768:13 [opt]
frame #13: 0x000000010a743fbb postgres`PostmasterMain(argc=<unavailable>, argv=0x00006000006480a0) at postmaster.c:1476:11 [opt]
frame #14: 0x000000010a61c775 postgres`main(argc=8, argv=<unavailable>) at main.c:197:3 [opt]

Looks like it might be the same bug, but perhaps not.

I recompiled access/transam and access/rmgrdesc at -O0 to get the accurate
line numbers shown for those files. Let me know if you need any more
info; I can add -O0 in more places, or poke around in the cores.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-27 20:50:47 Re: longfin and tamandua aren't too happy but I'm not sure why
Previous Message Tom Lane 2022-09-27 20:13:36 Re: Convert *GetDatum() and DatumGet*() macros to inline functions