Re: Refactoring postmaster's code to cleanup after child exit

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Date: 2024-12-09 12:47:49
Message-ID: c576eddd-13da-4bad-a760-c98dfe9a314d@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/9/24 13:30, Heikki Linnakangas wrote:
> On 09/12/2024 01:12, Tomas Vondra wrote:
>> On 11/14/24 15:13, Heikki Linnakangas wrote:
>>> On 09/10/2024 23:40, Heikki Linnakangas wrote:
>>>> I pushed the first three patches, with the new test and one of the
>>>> small
>>>> refactoring patches. Thanks for all the comments so far! Here is a new
>>>> version of the remaining patches.
>>>>
>> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
>> seems to have problems with valgrind :-( I reliably get this failure:
>
> How exactly do you run the test with valgrind? What platform?
>

It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).

> It works for me, with this:
>
> (cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
> valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
> heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
> pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )
>

I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.

>> t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
>> declared and done_testing() was not seen.
>> # Looks like your test exited with 29 just after 4.
>> t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424,
>> 0x1d00)
>> All 4 subtests passed
>>
>>
>> and tmp_check/log/regress_log_001_connection_limits says:
>>
>>
>> [23:48:44.444](1.129s) ok 3 - reserved_connections limit
>> [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
>> process ended prematurely at
>> /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/
>> PostgreSQL/Test/BackgroundPsql.pm
>> line 154.
>> # Postmaster PID for node "primary" is 198592
>>
>>
>> That BackgroundPsql.pm line is this in wait_connect()
>>
>>    $self->{run}->pump()
>>      until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
>>
>> By trial and error I found that it fails on this line 70:
>>
>>    push(@sessions, background_psql_as_user('regress_superuser'));
>>
>> but I have no idea idea why. There are multiple similar calls a couple
>> lines earlier, and those work fine. And various other TAP tests with
>> background_sql() work fine too.
>>
>> So what's so special about this particular line?
>
> Weird. Valgrind makes everything slow; is it a timeout? Any other clues
> in the logs?
>

Yeah, weird.

Timeouts were the first thing I thought about, but it fails even if I
set PGCTLTIMEOUT/PG_TEST_TIMEOUT_DEFAULT to 3600. And it doesn't seem to
be waiting for anything for that long :-(

regards

--
Tomas Vondra

Attachment Content-Type Size
valgrind-master.patch text/x-patch 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-12-09 13:02:19 Re: Sort functions with specialized comparators
Previous Message Heikki Linnakangas 2024-12-09 12:30:25 Re: Refactoring postmaster's code to cleanup after child exit