Connection limits/permissions, slotsync workers, etc

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Connection limits/permissions, slotsync workers, etc
Date: 2024-12-25 19:49:50
Message-ID: 1808397.1735156190@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In connection with the discussion at [1], I started to look at
exactly which server processes ought to be subject to connection
limits (datconnlimit, ACL_CONNECT, and related checks). The
current situation seems to be an inconsistent mess.

Looking at InitPostgres, InitializeSessionUserId, and CheckMyDatabase,
we have several different permissions and resource-limiting checks
that may be enforced against an incoming process:
ReservedConnections/SuperuserReservedConnections
rolcanlogin
rolconnlimit
datallowconn
datconnlimit
database's ACL_CONNECT privilege

I want to argue that ReservedConnections, rolconnlimit, and
datconnlimit should only be applied to regular backends. It
makes no particular sense to enforce them against autovac workers,
background workers, or wal senders, because each of those types of
processes has its own resource-limiting PGPROC pool. It's especially
bad to enforce them against parallel workers, since that creates
edge-case failures that make the use of parallelism not transparent
to applications. We hacked around that at 73c9f91a1, but I think
that should be reverted in favor of not applying the check at all.

I further argue that rolcanlogin, datallowconn, and ACL_CONNECT
should not be checked in a parallel worker, again primarily on the
grounds that this creates parallelism-specific failures (cf [1]).
The two scenarios where this occurs are (1) permissions were
revoked since the leader process connected, or (2) leader is
currently running as a role that wouldn't have permission to
connect on its own. We don't attempt to kick out the leader
process when either of those things happen, so why should we
prevent it from using parallelism?

The current situation for each of these checks is:

ReservedConnections is enforced if !am_superuser && !am_walsender,
so it is enforced against non-superuser background workers,
which is silly because BG workers have their own PGPROC pool;
moreover, what's the argument for letting walsenders but not other
kinds of background processes escape this? I propose changing it to
apply only to regular backends.

rolcanlogin is enforced if IsUnderPostmaster and we reach
InitializeSessionUserId, which basically reduces to regular backends,
parallel workers, logrep workers, and walsenders. Seems reasonable
for logrep workers and walsenders which represent fresh logins, but
not for parallel workers. I propose fixing this by making
ParallelWorkerMain pass BGWORKER_BYPASS_ROLELOGINCHECK.

rolconnlimit is enforced if IsUnderPostmaster and we reach
InitializeSessionUserId and it's not a superuser. So that applies
to non-superuser parallel workers, logrep workers, and walsenders,
and I don't think it's reasonable to apply it to any of them since
those all come out of other PGPROC pools. I propose switching that
to apply only to regular backends.

BTW, I kind of wonder why rolconnlimit is ineffectual for superusers,
especially when rolcanlogin does apply to them. Not a bug exactly,
but it sure seems inconsistent. If you've taken the trouble to set it
you'd expect it to work. Shall we take out the !is_superuser check?

datallowconn is enforced against all non-standalone, non-AV-worker
processes that connect to a specific database, except bgworkers that
pass BGWORKER_BYPASS_ALLOWCONN (nothing in core except test module).
So again that includes parallel workers, logrep workers, and
walsenders. Again this seems reasonable for logrep workers and
walsenders but not for parallel workers. I propose fixing this by
making ParallelWorkerMain pass BGWORKER_BYPASS_ALLOWCONN.

datconnlimit is enforced against all non-superuser processes,
including per-DB walsenders and BG workers (see above).
This is fairly dubious given that they have their own PGPROC pools.
I propose switching that to apply only to regular backends, too.

ACL_CONNECT is enforced against all non-superuser processes,
including per-DB walsenders and BG workers (includes
parallel workers, subscription apply workers, logrep workers).
Perhaps that's OK for most, but I argue not for parallel workers;
maybe skip it if BGWORKER_BYPASS_ALLOWCONN?

Also, the enforcement of datconnlimit and rolconnlimit is inconsistent
in another way: our counting of the pre-existing processes is pretty
random. CountDBConnections is not consistent with either the current
set of processes that datconnlimit is enforced against, or my proposal
to enforce it only against regular backends. It counts anything that
is not AmBackgroundWorkerProcess, including AV workers and per-DB
walsenders. I think it should count only regular backends, because
anything else leads to weird inconsistencies in whether a rejection
occurs.

The same applies to CountUserBackends (used for rolconnlimit check).
I argue these two functions should count only regular backends,
and the enforcement should likewise be only against regular backends.

Another recently-created problem is that the "slotsync worker"
process type we added in v17 hasn't been considered in any of this.
In particular, unlike every other process type that can obtain
a PGPROC, we did not consider it in the MaxBackends calculation:

/* the extra unit accounts for the autovacuum launcher */
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
max_worker_processes + max_wal_senders;

This means AFAICS that it effectively competes for one of the
MaxConnections PGPROC slots, meaning that it could fail for lack of a
slot or could lock out a client that should have been able to connect.
Shouldn't we have added another dedicated PGPROC slot for it?
(proc.c would require some additional work to make that happen.)
I wonder if the AV launcher and slotsync worker could be reclassified
as "auxiliary processes" instead of being their own weird animal.

regards, tom lane

[1] https://www.postgresql.org/message-id/8befc845430ba1ae3748af900af298788e579c89.camel%40cybertec.at

Browse pgsql-hackers by date

  From Date Subject
Next Message Michel Pelletier 2024-12-25 20:25:18 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Tomas Vondra 2024-12-25 17:25:50 Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)