RE: WaitEventSetWaitBlock() can still hang on Windows due to connection reset

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Alexander Lakhin' <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: RE: WaitEventSetWaitBlock() can still hang on Windows due to connection reset
Date: 2025-04-17 07:10:01
Message-ID: OSCPR01MB149661D2921D81502B5E73AF8F5BC2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Alexander,

While analyzing the BF failure [1], I noticed that the same issue may happen here,
which means apply worker waited something. According to the log, apply worker (PID 2820)
stucked so that logical replication could not restart.

Regress log:
```
### Restarting node "pub"
# Running: pg_ctl --wait --pgdata C:\\prog\\bf\\root\\HEAD\\pgsql.build/...
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
# Postmaster PID for node "pub" is 980
timed out waiting for match: (?^:Streaming transactions committing after ([A-F0-9]+/[A-F0-9]+), ...
```

Subscriber log;
```
2025-04-12 05:08:44.630 UTC [2820:1] LOG: logical replication apply worker for subscription "sub" has started
2025-04-12 05:08:44.642 UTC [5652:6] LOG: background worker "logical replication apply worker" (PID 6344) exited with exit code 1
2025-04-12 05:13:27.352 UTC [3988:1] LOG: checkpoint starting: time
2025-04-12 05:13:36.825 UTC [3988:2] LOG: checkpoint complete: wrote 62 buffers ...
2025-04-12 05:15:01.265 UTC [5652:7] LOG: received immediate shutdown request
2025-04-12 05:15:01.353 UTC [5652:8] LOG: database system is shut down
```

Publisher log;
```
2025-04-12 05:08:44.634 UTC [1112:7] LOG: database system is shut down
2025-04-12 05:08:45.685 UTC [980:1] LOG: starting PostgreSQL 18devel on...
2025-04-12 05:08:45.687 UTC [980:2] LOG: listening on IPv4 address "127.0.0.1", port 18057
2025-04-12 05:08:46.225 UTC [4392:1] LOG: database system was shut down at 2025-04-12 05:08:43 UTC
2025-04-12 05:08:46.319 UTC [980:3] LOG: database system is ready to accept connections
2025-04-12 05:15:00.408 UTC [980:4] LOG: received immediate shutdown request
2025-04-12 05:15:00.942 UTC [980:5] LOG: database system is shut down
```

Now the report has been reported for both physical and logical replication,
but I suspected that this can happen for all the application.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2025-04-12%2003%3A59%3A38&stg=recovery-check

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-04-17 07:15:41 Re: Decouple C++ support in Meson's PGXS from LLVM enablement
Previous Message Nisha Moond 2025-04-17 06:10:21 Re: Conflict detection for update_deleted in logical replication