From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | 035_standby_logical_decoding unbounded hang |
Date: | 2024-02-11 01:02:27 |
Message-ID: | 20240211010227.a2.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Coincidentally, one of my buildfarm animals hanged several weeks in a
different test, 035_standby_logical_decoding.pl. A LOG_SNAPSHOT_INTERVAL_MS
reduction was part of making it reproducible:
On Fri, Feb 02, 2024 at 04:01:45PM -0800, Noah Misch wrote:
> On Fri, Feb 02, 2024 at 02:30:03PM -0800, Noah Misch wrote:
> > On Fri, Feb 02, 2024 at 05:07:14PM -0500, Tom Lane wrote:
> > > If you look at the buildfarm's failures page and filter down to
> > > just subscriptionCheck failures, what you find is that all of the
> > > last 6 such failures are in 031_column_list.pl:
> > https://www.postgresql.org/message-id/flat/16d6d9cc-f97d-0b34-be65-425183ed3721%40gmail.com
> > reported a replacement BgWriterDelay value reproducing it.
>
> Correction: the recipe changes LOG_SNAPSHOT_INTERVAL_MS in addition to
> BgWriterDelay.
I'm reusing this thread just in case there's overlap with the
031_column_list.pl cause and fix. The 035_standby_logical_decoding.pl hang is
a race condition arising from an event sequence like this:
- Test script sends CREATE SUBSCRIPTION to subscriber, which loses the CPU.
- Test script calls pg_log_standby_snapshot() on primary. Emits XLOG_RUNNING_XACTS.
- checkpoint_timeout makes a primary checkpoint finish. Emits XLOG_RUNNING_XACTS.
- bgwriter executes LOG_SNAPSHOT_INTERVAL_MS logic. Emits XLOG_RUNNING_XACTS.
- CREATE SUBSCRIPTION wakes up and sends CREATE_REPLICATION_SLOT to standby.
Other test code already has a solution for this, so the attached patches add a
timeout and copy the existing solution. I'm also attaching the hack that
makes it 100% reproducible.
Attachment | Content-Type | Size |
---|---|---|
repro-standby-slot-test-race-v1.patch | text/plain | 2.1 KB |
standby-slot-test-1-timeout-v1.patch | text/plain | 1.4 KB |
standby-slot-test-2-race-v1.patch | text/plain | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Wienhold | 2024-02-11 01:52:56 | Re: Patch: Add parse_type Function |
Previous Message | Tom Lane | 2024-02-11 00:36:36 | Re: What about Perl autodie? |