RE: Connection limits/permissions, slotsync workers, etc

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: RE: Connection limits/permissions, slotsync workers, etc
Date: 2024-12-27 12:12:11
Message-ID: OS0PR01MB571674878E4B21654F94A7FF940E2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, December 26, 2024 3:50 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

Hi,

> In connection with the discussion at [1], I started to look at exactly which server
> processes ought to be subject to connection limits (datconnlimit,
> ACL_CONNECT, and related checks). The current situation seems to be an
> inconsistent mess.
>
> Looking at InitPostgres, InitializeSessionUserId, and CheckMyDatabase, we
> have several different permissions and resource-limiting checks that may be
> enforced against an incoming process:
> ReservedConnections/SuperuserReservedConnections
> rolcanlogin
> rolconnlimit
> datallowconn
> datconnlimit
> database's ACL_CONNECT privilege
>
> I want to argue that ReservedConnections, rolconnlimit, and datconnlimit
> should only be applied to regular backends. It makes no particular sense to
> enforce them against autovac workers, background workers, or wal senders,
> because each of those types of processes has its own resource-limiting
> PGPROC pool. It's especially bad to enforce them against parallel workers,
> since that creates edge-case failures that make the use of parallelism not
> transparent to applications. We hacked around that at 73c9f91a1, but I think
> that should be reverted in favor of not applying the check at all.
>
> I further argue that rolcanlogin, datallowconn, and ACL_CONNECT should not
> be checked in a parallel worker, again primarily on the grounds that this creates
> parallelism-specific failures (cf [1]).
> The two scenarios where this occurs are (1) permissions were revoked since
> the leader process connected, or (2) leader is currently running as a role that
> wouldn't have permission to connect on its own. We don't attempt to kick out
> the leader process when either of those things happen, so why should we
> prevent it from using parallelism?
>
> The current situation for each of these checks is:
>
> ReservedConnections is enforced if !am_superuser && !am_walsender, so it is
> enforced against non-superuser background workers, which is silly because
> BG workers have their own PGPROC pool; moreover, what's the argument for
> letting walsenders but not other kinds of background processes escape this?
> I propose changing it to apply only to regular backends.
>
> rolcanlogin is enforced if IsUnderPostmaster and we reach
> InitializeSessionUserId, which basically reduces to regular backends, parallel
> workers, logrep workers, and walsenders. Seems reasonable for logrep
> workers and walsenders which represent fresh logins, but not for parallel
> workers. I propose fixing this by making ParallelWorkerMain pass
> BGWORKER_BYPASS_ROLELOGINCHECK.
>
> rolconnlimit is enforced if IsUnderPostmaster and we reach
> InitializeSessionUserId and it's not a superuser. So that applies to
> non-superuser parallel workers, logrep workers, and walsenders, and I don't
> think it's reasonable to apply it to any of them since those all come out of other
> PGPROC pools. I propose switching that to apply only to regular backends.
>
> BTW, I kind of wonder why rolconnlimit is ineffectual for superusers, especially
> when rolcanlogin does apply to them. Not a bug exactly, but it sure seems
> inconsistent. If you've taken the trouble to set it you'd expect it to work. Shall
> we take out the !is_superuser check?
>
> datallowconn is enforced against all non-standalone, non-AV-worker
> processes that connect to a specific database, except bgworkers that pass
> BGWORKER_BYPASS_ALLOWCONN (nothing in core except test module).
> So again that includes parallel workers, logrep workers, and walsenders.
> Again this seems reasonable for logrep workers and walsenders but not for
> parallel workers. I propose fixing this by making ParallelWorkerMain pass
> BGWORKER_BYPASS_ALLOWCONN.
>
> datconnlimit is enforced against all non-superuser processes, including
> per-DB walsenders and BG workers (see above).
> This is fairly dubious given that they have their own PGPROC pools.
> I propose switching that to apply only to regular backends, too.
>
> ACL_CONNECT is enforced against all non-superuser processes, including
> per-DB walsenders and BG workers (includes parallel workers, subscription
> apply workers, logrep workers).
> Perhaps that's OK for most, but I argue not for parallel workers; maybe skip it if
> BGWORKER_BYPASS_ALLOWCONN?
>
> Also, the enforcement of datconnlimit and rolconnlimit is inconsistent in
> another way: our counting of the pre-existing processes is pretty random.
> CountDBConnections is not consistent with either the current set of processes
> that datconnlimit is enforced against, or my proposal to enforce it only against
> regular backends. It counts anything that is not
> AmBackgroundWorkerProcess, including AV workers and per-DB walsenders.
> I think it should count only regular backends, because anything else leads to
> weird inconsistencies in whether a rejection occurs.
>
> The same applies to CountUserBackends (used for rolconnlimit check).
> I argue these two functions should count only regular backends, and the
> enforcement should likewise be only against regular backends.

Personally, I find these proposals to be reasonable.

> Another recently-created problem is that the "slotsync worker"
> process type we added in v17 hasn't been considered in any of this.
> In particular, unlike every other process type that can obtain a PGPROC, we did
> not consider it in the MaxBackends calculation:
>
> /* the extra unit accounts for the autovacuum launcher */
> MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> max_worker_processes + max_wal_senders;
>
> This means AFAICS that it effectively competes for one of the MaxConnections
> PGPROC slots, meaning that it could fail for lack of a slot or could lock out a
> client that should have been able to connect.
> Shouldn't we have added another dedicated PGPROC slot for it?
> (proc.c would require some additional work to make that happen.)

Thanks for reporting the issue! I can reproduce the issue E.g., slotsync worker
cannot start when user backends have reached the max_connections limit. And I
agree that it would be better to add another dedicated PGPROC for it. I have
prepared a patch attached to address this.

> I wonder if the AV launcher and slotsync worker could be reclassified as "auxiliary
> processes" instead of being their own weird animal.

It appears that the current aux processes do not run transactions as stated in the
comments[1], so we may need to somehow release this restriction to achieve the
goal.

> [1]
> https://www.postgresql.org/message-id/8befc845430ba1ae3748af900af2987
> 88e579c89.camel%40cybertec.at
>

Best Regards,
Hou zj

Attachment Content-Type Size
v1-0001-Reserve-a-dedicated-PGPROC-slot-for-slotsync-work.patch application/octet-stream 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladlen Popolitov 2024-12-27 12:44:21 Bug and memory leaks with access to file links with long names (Windows, MSVS)
Previous Message Andrey Borodin 2024-12-27 11:05:07 Re: Fix bank selection logic in SLRU