From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK |
Date: | 2024-05-09 04:00:01 |
Message-ID: | f9bebfe6-cee4-ed87-d4e6-29b5ca4be08d@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
Looking at a recent failure on the buildfarm:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2024-04-30%2020%3A48%3A34
# poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
[23:01:41] t/020_archive_status.pl ..............
Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests
with the following error in the log:
2024-04-30 22:57:27.931 CEST [83115:1] LOG: archive command failed with exit code 1
2024-04-30 22:57:27.931 CEST [83115:2] DETAIL: The failed archive command was: cp
"pg_wal/000000010000000000000001_does_not_exist" "000000010000000000000001_does_not_exist"
...
2024-04-30 22:57:28.070 CEST [47962:2] [unknown] LOG: connection authorized: user=pgbf database=postgres
application_name=020_archive_status.pl
2024-04-30 22:57:28.072 CEST [47962:3] 020_archive_status.pl LOG: statement: SELECT archived_count FROM pg_stat_archiver
2024-04-30 22:57:28.073 CEST [83115:3] LOG: could not send to statistics collector: Resource temporarily unavailable
and the corresponding code (on REL_13_STABLE):
static void
pgstat_send(void *msg, int len)
{
int rc;
if (pgStatSock == PGINVALID_SOCKET)
return;
((PgStat_MsgHdr *) msg)->m_size = len;
/* We'll retry after EINTR, but ignore all other failures */
do
{
rc = send(pgStatSock, msg, len, 0);
} while (rc < 0 && errno == EINTR);
#ifdef USE_ASSERT_CHECKING
/* In debug builds, log send failures ... */
if (rc < 0)
elog(LOG, "could not send to statistics collector: %m");
#endif
}
I wonder, whether this retry should be performed after EAGAIN (Resource
temporarily unavailable), EWOULDBLOCK as well.
With a simple send() wrapper (PFA) activated with LD_PRELOAD, I could
reproduce this failure easily when running
`make -s check -C src/test/recovery/ PROVE_TESTS="t/020*"` on
REL_13_STABLE:
t/020_archive_status.pl .. 1/16 # poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests
I also reproduced another failure (that lacks useful diagnostics, unfortunately):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2022-11-10%2015%3A30%3A16
...
t/020_archive_status.pl .. 8/16 # poll_query_until timed out executing this query:
# SELECT last_archived_wal FROM pg_stat_archiver
# expecting this output:
# 000000010000000000000002
# last actual query output:
# 000000010000000000000001
# with stderr:
# Looks like your test exited with 29 just after 13.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 3/16 subtests
...
The "n == 64" condition in the cranky send() is needed to aim exactly
these failures. Without this restriction the test (and also `make check`)
just hangs because of:
if (errno == EINTR)
continue; /* Ok if we were interrupted */
/*
* Ok if no data writable without blocking, and the socket is in
* non-blocking mode.
*/
if (errno == EAGAIN ||
errno == EWOULDBLOCK)
{
return 0;
}
in internal_flush_buffer().
On the other hand, even with:
int
send(int s, const void *buf, size_t n, int flags)
{
if (rand() % 10000 == 0)
{
errno = EINTR;
return -1;
}
return real_send(s, buf, n, flags);
}
`make check` fails with many miscellaneous errors...
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
send-with-EAGAIN.c | text/x-csrc | 462 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2024-05-09 04:03:50 | First draft of PG 17 release notes |
Previous Message | Paul Jungwirth | 2024-05-09 03:47:45 | Re: PERIOD foreign key feature |