Regression tests fail on OpenBSD due to low semmns value

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Regression tests fail on OpenBSD due to low semmns value
Date: 2024-12-16 05:00:00
Message-ID: db2773a2-aca0-43d0-99c1-060efcd9954e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

A recent buildfarm timeout failure on sawshark [1] made me wonder, what's
wrong with that animal — beside that failure, this animal (running on
OpenBSD 7.4) produced "too many clients" errors from time to time, e. g.,
[2], [3].

I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that
hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4])
and as a consequence, initdb sets max_connections = 20 for the regression
test database. (This can be helpful sometimes, see e.g., [5].) At the same
time, paralell_schedule contains groups of 20 tests, for instance:
# parallel group (20 tests):  select_into random delete select_having select_distinct_on case prepared_xacts namespace
select_implicit union arrays portals transactions select_distinct subselect update join aggregates hash_index btree_index

Moreover, prepared_xacts performs "\c", and it adds one more connection
for a short time, according to postmaster.log:
2024-12-16 06:18:20.290 EET [regression][1563560:91][client backend] [pg_regress/prepared_xacts] LOG:  statement: rollback;
...
2024-12-16 06:18:20.290 EET [regression][1563561:2][client backend] [[unknown]] FATAL:  sorry, too many clients already
...
2024-12-16 06:18:20.291 EET [regression][1563560:95][client backend] [pg_regress/prepared_xacts] LOG:  disconnection:
session time: 0:00:00.018 user=law database=regression host=[local]

sysctl kern.seminfo.semmns=120 makes the issue go away on this OS;
on the hand, "too many clients" failures can be reproduced on other OS,
with "max_connections=20" in TEMP_CONFIG.

As to the hang, it can be reproduced easily with:
TEMP_CONFIG containing
max_connections=2
superuser_reserved_connections=0

and parallel_schedule as simple as:
test: transactions prepared_xacts
test: transactions prepared_xacts

Running `TEMP_CONFIG=.../extra.config make -s check`, I can see:
# +++ regress check in src/test/regress +++
...
# parallel group (2 tests):  prepared_xacts transactions
not ok 1     + transactions                               56 ms
not ok 2     + prepared_xacts                             21 ms
# (test process exited with exit code 2)
# parallel group (2 tests):
### the test is hanging here ###

with one backend waiting inside:
#0  0x000070c41ed2a007 in epoll_wait (epfd=6, events=0x629f1ce529e8, maxevents=1, timeout=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x0000629f1410d64a in WaitEventSetWaitBlock (set=0x629f1ce52980, cur_timeout=-1, occurred_events=0x7ffd4c4ffed0,
nevents=1) at latch.c:1564
#2  0x0000629f1410d534 in WaitEventSetWait (set=0x629f1ce52980, timeout=-1, occurred_events=0x7ffd4c4ffed0, nevents=1,
wait_event_info=134217779) at latch.c:1510
#3  0x0000629f1410c764 in WaitLatch (latch=0x70c41b86bc24, wakeEvents=33, timeout=0, wait_event_info=134217779) at
latch.c:538
#4  0x0000629f1413d032 in ProcWaitForSignal (wait_event_info=134217779) at proc.c:1893
#5  0x0000629f14132eb9 in GetSafeSnapshot (origSnapshot=0x629f147ad360 <CurrentSnapshotData>) at predicate.c:1579
#6  0x0000629f14133261 in GetSerializableTransactionSnapshot (snapshot=0x629f147ad360 <CurrentSnapshotData>) at
predicate.c:1695
#7  0x0000629f143afafe in GetTransactionSnapshot () at snapmgr.c:253
#8  0x0000629f1414a7b8 in exec_simple_query (query_string=0x629f1ce580f0 "SELECT * FROM writetest;") at postgres.c:1172
...

So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to
become empty (for other backend to remove itself from the list of possible conflicts
inside ReleasePredicateLocks()), but it doesn't happen.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-12-11%2012%3A20%3A05
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-07-22%2001%3A20%3A22
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-11-25%2006%3A20%3A22
[4] https://man.openbsd.org/options
[5] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=73c9f91a1

Best regards,
Alexander

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-12-16 05:02:14 Re: Add Postgres module info
Previous Message Amit Langote 2024-12-16 04:44:46 Re: Enhancing Memory Context Statistics Reporting