From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date: | 2020-09-14 01:27:56 |
Message-ID: | 929224.1600046876@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> Probably this requires a relcache inval at the wrong time;
> although we have recent passes from CLOBBER_CACHE_ALWAYS animals,
> so that can't be the whole triggering condition. I wonder whether
> it is relevant that all of the complaining animals are JIT-enabled.
Hmmm ... I take that back. hyrax has indeed passed since this went
in, but *it doesn't run any TAP tests*. So the buildfarm offers no
information about whether the replication tests work under
CLOBBER_CACHE_ALWAYS.
Realizing that, I built an installation that way and tried to run
the subscription tests. Results so far:
* Running 010_truncate.pl by itself passed for me. So there's still
some unexplained factor needed to trigger the buildfarm failures.
(I'm wondering about concurrent autovacuum activity now...)
* Starting over, it appears that 001_rep_changes.pl almost immediately
gets into an infinite loop. It does not complete the third test step,
rather infinitely waiting for progress to be made. The publisher log
shows a repeating loop like
2020-09-13 21:16:05.734 EDT [928529] tap_sub LOG: could not send data to client: Broken pipe
2020-09-13 21:16:05.734 EDT [928529] tap_sub CONTEXT: slot "tap_sub", output plugin "pgoutput", in the commit callback, associated LSN 0/1660628
2020-09-13 21:16:05.843 EDT [928581] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:05.861 EDT [928582] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 21:16:05.929 EDT [928582] tap_sub LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: starting logical decoding for slot "tap_sub"
2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL: Streaming transactions committing after 0/1652820, reading WAL from 0/1651B20.
2020-09-13 21:16:05.930 EDT [928582] tap_sub LOG: logical decoding found consistent point at 0/1651B20
2020-09-13 21:16:05.930 EDT [928582] tap_sub DETAIL: There are no running transactions.
2020-09-13 21:16:21.560 EDT [928600] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:37.291 EDT [928610] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:16:52.959 EDT [928627] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:17:06.866 EDT [928636] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 21:17:06.934 EDT [928636] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:06.934 EDT [928636] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582
2020-09-13 21:17:07.811 EDT [928638] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:07.880 EDT [928638] tap_sub LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 21:17:07.881 EDT [928638] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:07.881 EDT [928638] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582
2020-09-13 21:17:08.618 EDT [928641] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-09-13 21:17:08.753 EDT [928642] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 21:17:08.821 EDT [928642] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:08.821 EDT [928642] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582
2020-09-13 21:17:09.689 EDT [928645] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 21:17:09.756 EDT [928645] tap_sub LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 21:17:09.757 EDT [928645] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/1652820 (proto_version '2', publication_names '"tap_pub","tap_pub_ins_only"')
2020-09-13 21:17:09.757 EDT [928645] tap_sub ERROR: replication slot "tap_sub" is active for PID 928582
2020-09-13 21:17:09.841 EDT [928582] tap_sub LOG: could not send data to client: Broken pipe
2020-09-13 21:17:09.841 EDT [928582] tap_sub CONTEXT: slot "tap_sub", output plugin "pgoutput", in the commit callback, associated LSN 0/1660628
while the subscriber is repeating
2020-09-13 21:15:01.598 EDT [928528] LOG: logical replication apply worker for subscription "tap_sub" has started
2020-09-13 21:16:02.178 EDT [928528] ERROR: terminating logical replication worker due to timeout
2020-09-13 21:16:02.179 EDT [920797] LOG: background worker "logical replication worker" (PID 928528) exited with exit code 1
2020-09-13 21:16:02.606 EDT [928571] LOG: logical replication apply worker for subscription "tap_sub" has started
2020-09-13 21:16:03.117 EDT [928571] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is active for PID 928529
2020-09-13 21:16:03.118 EDT [920797] LOG: background worker "logical replication worker" (PID 928571) exited with exit code 1
2020-09-13 21:16:03.544 EDT [928574] LOG: logical replication apply worker for subscription "tap_sub" has started
2020-09-13 21:16:04.053 EDT [928574] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is active for PID 928529
2020-09-13 21:16:04.054 EDT [920797] LOG: background worker "logical replication worker" (PID 928574) exited with exit code 1
2020-09-13 21:16:04.479 EDT [928576] LOG: logical replication apply worker for subscription "tap_sub" has started
2020-09-13 21:16:04.990 EDT [928576] ERROR: could not start WAL streaming: ERROR: replication slot "tap_sub" is active for PID 928529
2020-09-13 21:16:04.990 EDT [920797] LOG: background worker "logical replication worker" (PID 928576) exited with exit code 1
2020-09-13 21:16:05.415 EDT [928579] LOG: logical replication apply worker for subscription "tap_sub" has started
2020-09-13 21:17:05.994 EDT [928579] ERROR: terminating logical replication worker due to timeout
I'm out of patience to investigate this for tonight, but there is
something extremely broken here; maybe more than one something.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-09-14 01:45:49 | Re: Avoid incorrect allocation in buildIndexArray |
Previous Message | Tom Lane | 2020-09-13 23:00:26 | Re: Gripes about walsender command processing |