Re: BUG #18815: Logical replication worker Segmentation fault

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sergey Belyashov <sergey(dot)belyashov(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18815: Logical replication worker Segmentation fault
Date: 2025-02-17 23:37:56
Message-ID: 1072645.1739835476@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> Further to this ... I'd still really like to have a reproducer.
> While brininsertcleanup is clearly being less robust than it should
> be, I now suspect that there is another bug somewhere further down
> the call stack. We're getting to this point via ExecCloseIndices,
> and that should be paired with ExecOpenIndices, and that would have
> created a fresh IndexInfo. So it looks a lot like some path in a
> logrep worker is able to call ExecCloseIndices twice on the same
> working data. That would probably lead to a "releasing a lock you
> don't own" error if we weren't hitting this crash first.

Hmm ... I tried modifying ExecCloseIndices to blow up if called
twice, as in the attached. This gets through core regression
just fine, but it blows up in three different subscription TAP
tests, all with a stack trace matching Sergey's:

#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f064bfe3e65 in __GI_abort () at abort.c:79
#2 0x00000000009e9253 in ExceptionalCondition (
conditionName=conditionName(at)entry=0xb8717b "indexDescs[i] != NULL",
fileName=fileName(at)entry=0xb87139 "execIndexing.c",
lineNumber=lineNumber(at)entry=249) at assert.c:66
#3 0x00000000006f0b13 in ExecCloseIndices (
resultRelInfo=resultRelInfo(at)entry=0x2f11c18) at execIndexing.c:249
#4 0x00000000006f86d8 in ExecCleanupTupleRouting (mtstate=0x2ef92d8,
proute=0x2ef94e8) at execPartition.c:1273
#5 0x0000000000848cb6 in finish_edata (edata=0x2ef8f50) at worker.c:717
#6 0x000000000084d0a0 in apply_handle_insert (s=<optimized out>)
at worker.c:2460
#7 apply_dispatch (s=<optimized out>) at worker.c:3389
#8 0x000000000084e494 in LogicalRepApplyLoop (last_received=25066600)
at worker.c:3680
#9 start_apply (origin_startpos=0) at worker.c:4507
#10 0x000000000084e711 in run_apply_worker () at worker.c:4629
#11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:4798
#12 0x00000000008138f9 in BackgroundWorkerMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at bgworker.c:842

The problem seems to be that apply_handle_insert_internal does
ExecOpenIndices and then ExecCloseIndices, and then
ExecCleanupTupleRouting does ExecCloseIndices again, which nicely
explains why brininsertcleanup blows up if you happen to have a BRIN
index involved. What it doesn't explain is how come we don't see
other symptoms from the duplicate index_close calls, regardless of
index type. I'd have expected an assertion failure from
RelationDecrementReferenceCount, and/or an assertion failure for
nonzero rd_refcnt at transaction end, and/or a "you don't own a lock
of type X" gripe from LockRelease. We aren't getting any of those,
but why not, if this code is as broken as I think it is?

(On closer inspection, we seem to have about 99% broken relcache.c's
ability to notice rd_refcnt being nonzero at transaction end, but
the other two things should still be happening.)

regards, tom lane

Attachment Content-Type Size
complain-about-duplicate-ExecCloseIndices.patch text/x-diff 678 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Chris BSomething 2025-02-17 23:39:31 Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE
Previous Message David G. Johnston 2025-02-17 23:12:39 Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-02-17 23:38:07 Re: Commitfest app release on Feb 17 with many improvements
Previous Message Michael Paquier 2025-02-17 23:34:32 Re: per backend WAL statistics