From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Race conditions in 019_replslot_limit.pl |
Date: | 2022-02-17 01:22:23 |
Message-ID: | 3518071.1645060943@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> I think the test is telling us that something may be broken. We shouldn't
> silence that without at least some understanding what it is.
I looked at the recent failure on komodoensis [1], and I think what is
happening is just that the walsender for the basebackup run (launched
at 019_replslot_limit.pl line 325) hasn't exited yet at the point where
we do a blind
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'"
and expect that we're only going to see the walsender launched for
the standby at line 331. The two PIDs reported in the failure
correspond to this postmaster log trace:
2022-02-16 23:06:29.596 CET [620d7565.38dd62:1] LOG: connection received: host=[local]
2022-02-16 23:06:29.596 CET [620d7565.38dd62:2] LOG: replication connection authorized: user=bf application_name=019_replslot_limit.pl
2022-02-16 23:06:29.596 CET [620d7565.38dd62:3] LOG: received replication command: SHOW data_directory_mode
2022-02-16 23:06:29.596 CET [620d7565.38dd62:4] STATEMENT: SHOW data_directory_mode
2022-02-16 23:06:29.596 CET [620d7565.38dd62:5] LOG: received replication command: CREATE_REPLICATION_SLOT "pg_basebackup_3726690" TEMPORARY PHYSICAL ( RESERVE_WAL)
2022-02-16 23:06:29.596 CET [620d7565.38dd62:6] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_3726690" TEMPORARY PHYSICAL ( RESERVE_WAL)
2022-02-16 23:06:29.597 CET [620d7565.38dd62:7] LOG: received replication command: IDENTIFY_SYSTEM
2022-02-16 23:06:29.597 CET [620d7565.38dd62:8] STATEMENT: IDENTIFY_SYSTEM
2022-02-16 23:06:29.597 CET [620d7565.38dd62:9] LOG: received replication command: START_REPLICATION SLOT "pg_basebackup_3726690" 0/600000 TIMELINE 1
2022-02-16 23:06:29.597 CET [620d7565.38dd62:10] STATEMENT: START_REPLICATION SLOT "pg_basebackup_3726690" 0/600000 TIMELINE 1
and this one:
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:1] LOG: connection received: host=[local]
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:2] LOG: replication connection authorized: user=bf application_name=standby_3
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:3] LOG: received replication command: IDENTIFY_SYSTEM
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:4] STATEMENT: IDENTIFY_SYSTEM
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:5] LOG: received replication command: START_REPLICATION SLOT "rep3" 0/700000 TIMELINE 1
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:6] STATEMENT: START_REPLICATION SLOT "rep3" 0/700000 TIMELINE 1
There's no disconnection log entry for either, which I suppose means
that somebody didn't bother logging disconnection for walsenders ...
shouldn't we fix that? But in any case, I don't see anything
interesting here, just a query that needs to be more selective.
Perhaps we can look for application_name=standby_3?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2022-02-16%2021%3A16%3A04
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-02-17 01:28:02 | Re: libpq async duplicate error results |
Previous Message | Peter Geoghegan | 2022-02-17 01:16:13 | Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM |