From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Refactoring postmaster's code to cleanup after child exit |
Date: | 2024-12-09 20:55:45 |
Message-ID: | 574340ca-c90e-4086-9ae2-cd2f1359e989@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/12/2024 14:47, Tomas Vondra wrote:
> On 12/9/24 13:30, Heikki Linnakangas wrote:
>> On 09/12/2024 01:12, Tomas Vondra wrote:
>>> On 11/14/24 15:13, Heikki Linnakangas wrote:
>>>> On 09/10/2024 23:40, Heikki Linnakangas wrote:
>>>>> I pushed the first three patches, with the new test and one of the
>>>>> small
>>>>> refactoring patches. Thanks for all the comments so far! Here is a new
>>>>> version of the remaining patches.
>>>>>
>>> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
>>> seems to have problems with valgrind :-( I reliably get this failure:
>>
>> How exactly do you run the test with valgrind? What platform?
>>
>
> It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).
>
>> It works for me, with this:
>>
>> (cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
>> valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
>> heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
>> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
>> pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )
>>
>
> I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
> just do
>
> ./configure --enable-debug --prefix=/home/user/builds/master
> --enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
> -DUSE_VALGRIND"
>
> and then the usual "make check" or whatever.
>
> The patch has a hardcoded path to the .supp file, and places the
> valgrind log into /tmp. It has worked for me fine up until that commit,
> and it still seems to be working in every other test directory.
Ok, I was able to reproduce this with that setup.
Unsurprisingly, it's a timing issue. It can be reproduced without
valgrind by adding this delay:
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 289059435a9..1eb6bad72ca 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -583,6 +583,7 @@ errfinish(const char *filename, int lineno, const
char *funcname)
* FATAL termination. The postmaster may or may not consider this
* worthy of panic, depending on which subprocess returns it.
*/
+ sleep(1);
proc_exit(1);
}
The test opens a connection that is expected to fail with the "remaining
connection slots are reserved for roles with the SUPERUSER attribute"
error. Right after that, it opens a new connection as superuser, and
expects it to succeed. But if the previous backend hasn't exited yet,
the new connection fails with "too many clients already".
Not sure how to fix this. A small sleep in the test would work, but in
principle there's no delay that's guaranteed to be enough. A more robust
solution would be to run a "select count(*) from pg_stat_activity" and
wait until the number of connections are what's expected. I'll try that
and see how complicated that gets..
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Koval | 2024-12-09 21:01:43 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |
Previous Message | Michail Nikolaev | 2024-12-09 20:53:00 | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |