From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Testing autovacuum wraparound (including failsafe) |
Date: | 2024-05-22 12:00:00 |
Message-ID: | 5811175c-1a31-4869-032f-7af5e3e4506a@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
30.11.2023 10:35, Masahiko Sawada wrote:
>
>> I've attached new version patches (0002 and 0003 are unchanged except
>> for the commit message). I'll push them, barring any objections.
>>
> Pushed.
I've discovered that the test 001_emergency_vacuum.pl can fail due to a
race condition. I can't see the server log at [1], but I reproduced the
failure locally and with additional logging and log_min_messages = DEBUG3,
the log shows:
...
2024-05-22 11:46:28.125 UTC [21256:2853] DEBUG: SlruScanDirectory invoking callback on pg_xact/0690
2024-05-22 11:46:28.125 UTC [21256:2854] DEBUG: transaction ID wrap limit is 2147484396, limited by database with OID 5
2024-05-22 11:46:28.126 UTC [21256:2855] LOG: !!!SendPostmasterSignal| PMSIGNAL_START_AUTOVAC_LAUNCHER
2024-05-22 11:46:28.135 UTC [14871:20077] DEBUG: postmaster received pmsignal signal
2024-05-22 11:46:28.137 UTC [21257:1] DEBUG: autovacuum launcher started
2024-05-22 11:46:28.137 UTC [21257:2] DEBUG: InitPostgres
2024-05-22 11:46:28.138 UTC [21257:3] LOG: !!!AutoVacLauncherMain| !AutoVacuumingActive() && !ShutdownRequestPending;
before do_start_worker()
2024-05-22 11:46:28.138 UTC [21257:4] LOG: !!!do_start_worker| return quickly when there are no free workers
2024-05-22 11:46:28.138 UTC [21257:5] DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:6] DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:7] DEBUG: proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.138 UTC [21257:8] DEBUG: exit(0)
2024-05-22 11:46:28.138 UTC [21257:9] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:10] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:11] DEBUG: proc_exit(-1): 0 callbacks to make
2024-05-22 11:46:28.146 UTC [21256:2856] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 5
2024-05-22 11:46:28.146 UTC [21256:2857] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1
2024-05-22 11:46:28.146 UTC [21256:2858] DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2859] DEBUG: shmem_exit(0): 7 on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2860] DEBUG: proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2861] DEBUG: exit(0)
2024-05-22 11:46:28.147 UTC [21256:2862] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2863] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2864] DEBUG: proc_exit(-1): 0 callbacks to make
2024-05-22 11:46:28.151 UTC [14871:20078] DEBUG: forked new backend, pid=21258 socket=8
2024-05-22 11:46:28.171 UTC [14871:20079] DEBUG: server process (PID 21256) exited with exit code 0
2024-05-22 11:46:28.152 UTC [21258:1] [unknown] LOG: connection received: host=[local]
2024-05-22 11:46:28.171 UTC [21258:2] [unknown] DEBUG: InitPostgres
2024-05-22 11:46:28.172 UTC [21258:3] [unknown] LOG: connection authenticated: user="vagrant" method=trust
(/pgtest/postgresql.git/src/test/modules/xid_wraparound/tmp_check/t_001_emergency_vacuum_main_data/pgdata/pg_hba.conf:117)
2024-05-22 11:46:28.172 UTC [21258:4] [unknown] LOG: connection authorized: user=vagrant database=postgres
application_name=001_emergency_vacuum.pl
2024-05-22 11:46:28.175 UTC [21258:5] 001_emergency_vacuum.pl LOG: statement: INSERT INTO small(data) SELECT 1
That is, autovacuum worker (21256) sent PMSIGNAL_START_AUTOVAC_LAUNCHER,
postmaster started autovacuum launcher, which could not start new
autovacuum worker due to the process 21256 not exited yet.
The failure can be reproduced easily with the sleep added inside
SetTransactionIdLimit():
if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
IsUnderPostmaster && !InRecovery)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
+pg_usleep(10000L);
By the way I also discovered that rather resource-intensive xid_wraparound
tests executed twice during the buildfarm "make" run (on dodo and perentie
(see [2]) animals), at stage module-xid_wraparound-check and then at stage
testmodules-install-check-C.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-05-19%2006%3A33%3A34
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=perentie&dt=2024-05-22%2000%3A02%3A19
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | torikoshia | 2024-05-22 12:25:41 | Re: First draft of PG 17 release notes |
Previous Message | Amit Kapila | 2024-05-22 11:42:40 | Re: State of pg_createsubscriber |