From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Sergey Belyashov <sergey(dot)belyashov(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: BUG #18815: Logical replication worker Segmentation fault |
Date: | 2025-02-18 20:38:34 |
Message-ID: | 1761873.1739911114@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
[ redirecting to -hackers for patch discussion ]
Here's a completed set of patches for bug #18815 [1] (which,
briefly, is that the logrep worker opens/closes indexes
multiple times and thereby breaks brininsertcleanup).
0001 adds a subscriber-side BRIN index to 013_partition.pl, which
replicates the crash Sergey reported. I've got mixed feelings about
whether to actually commit this: it's certainly useful for testing
this fix, but without context it looks like a pretty random thing to
do. It could be justified perhaps on the grounds of testing
aminsertcleanup callbacks within replication, since BRIN is the
only core index type that has such a callback.
0002 fixes the crash by hardening brininsertcleanup against multiple
calls. I think that this is patching a symptom not the root cause,
but it still seems like good defensive programming.
0003 adds Asserts to ExecOpenIndices and ExecCloseIndices to check
that they're not called more than once per ResultRelInfo, and thereby
exposes what I consider the root cause: apply_handle_tuple_routing
opens the indexes twice and then closes them twice. This somewhat
accidentally leaves us with zero refcounts and zero locks on the
indexes, so in the absence of aminsertcleanup callbacks the only real
reason to complain is the duplicative construction of the IndexInfo
structures. But the double call of brininsertcleanup breaks the
existing coding of that function, and it might well break third-party
aminsertcleanup callbacks if there are any. So I think we should
institute a policy that this coding pattern is forbidden.
And finally, 0004 fixes worker.c to not do that. This turned out
simpler than I thought, because I was misled by believing that the
ExecOpenIndices/ExecCloseIndices calls in apply_handle_tuple_routing
itself do something useful. They don't, and should be nuked outright.
I don't intend 0003 for back-patch, but the rest of this has to go
back as far as the bug can be observed, which I didn't test yet.
To be clear, 0004 would fix the issue even without 0002, but
I still think we should back-patch both.
regards, tom lane
[1] https://www.postgresql.org/message-id/flat/18815-2a0407cc7f40b327%40postgresql.org
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Demonstrate-crash-in-brininsertcleanup-during-log.patch | text/x-diff | 1.3 KB |
v1-0002-Harden-brininsertcleanup-against-repeat-calls.patch | text/x-diff | 1.7 KB |
v1-0003-Assert-that-ExecOpenIndices-and-ExecCloseIndices-.patch | text/x-diff | 2.5 KB |
v1-0004-Avoid-duplicate-ExecOpenIndices-ExecCloseIndices-.patch | text/x-diff | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2025-02-18 20:52:08 | Re: Inconsistency of timezones in postgresql |
Previous Message | Sergey Belyashov | 2025-02-18 17:41:55 | Re: BUG #18815: Logical replication worker Segmentation fault |
From | Date | Subject | |
---|---|---|---|
Next Message | Rafael Thofehrn Castro | 2025-02-18 20:47:58 | Re: Proposal: Progressive explain |
Previous Message | Zsolt Parragi | 2025-02-18 20:02:13 | Re: Improve pgindent exclude handling: ignore empty lines |