From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: Flaky 003_start_stop.pl test |
Date: | 2025-03-05 00:41:38 |
Message-ID: | rm3wawhaebopcncwwadnenypl7mwwqacoqj2u35qi36d5zcms2@wbbcb6t6ukf4 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-04 09:48:45 +1300, Thomas Munro wrote:
> On Tue, Mar 4, 2025 at 5:48 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2025-02-24 11:26:56 +0100, Jelte Fennema-Nio wrote:
> > > [1]: https://cirrus-ci.com/task/5571017969500160?logs=test_world#L256
> >
> > A possibly related test failure is:
> >
> > https://cirrus-ci.com/task/5005337046024192?logs=check_world#L259
>
> They look different to me:
Thanks for checking!
> (1) In Jelte's report for 003_start_stop, we see:
>
> [03:40:42.347](2.022s) ok 1 - dead-end connection 0
> [03:40:42.348](0.001s) ok 2 - dead-end connection 1
> [03:40:42.349](0.001s) ok 3 - dead-end connection 2
> ... up to 20 ...
>
> We see in the server log that the first one had pid 31660 and slot 1,
> but it never said "SSLRequest rejected" like all the others. The Perl
> code must have received "N", or it would not have moved onto opening
> the next connection. Then just as it is about to try to connect,
> hoping to see "too many clients", that one exits, the "invalid_role"
> connection gets slot 1 instead of the dead-end backend we expected:
>
> 03:40:42.375 UTC [31645][postmaster] DEBUG: client backend (PID
> 31660) exited with exit code 0 <-- goodbye, silent backend
> 03:40:42.480 UTC [31681][not initialized] [[unknown]][:0] LOG:
> connection received: host=[local]
> 03:40:42.480 UTC [31681][not initialized] [[unknown]][:0] LOG:
> SSLRequest rejected
> 03:40:42.511 UTC [31645][postmaster] DEBUG: assigned pm child slot 1
> for backend <-- we get silent backend's slot
> 03:40:42.512 UTC [31645][postmaster] DEBUG: forked new client
> backend, pid=31690 socket=7
> 03:40:42.542 UTC [31690][not initialized] [[unknown]][:0] LOG:
> connection received: host=[local]
> 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] LOG:
> connection authenticated: user="invalid_user" method=trust
> (/tmp/cirrus-ci-build/build/testrun/postmaster/003_start_stop/data/t_003_start_stop_main_data/pgdata/pg_hba.conf:117)
> 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] LOG:
> connection authorized: user=invalid_user database=postgres
> application_name=003_start_stop.pl
> 03:40:42.543 UTC [31690][client backend] [[unknown]][0/1:0] FATAL:
> role "invalid_user" does not exist
Good catch.
The pattern is similar in other failed tests. There has been a spate of errors
on postgres/postgres' CI, which made me look at this again.
> Could the backend have sent an "N" (NoticeResponse) unrelated to SSL
> negotiation?
I don't think that'd be legal protocol flow from the server side. And, given
that the test uses log_min_messages=debug2, should have been logged in the
server log.
All the recent failures were on FreeBSD, interestingly.
I wonder if we should, at least temporarily, increase the test's log level?
I also suspect that it'd be worthwhile to add a few more if
(Trace_connection_negotiation) messages.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2025-03-05 00:53:14 | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Previous Message | Euler Taveira | 2025-03-05 00:33:39 | Re: log_min_messages per backend type |