Quick Links

Re: scalability bottlenecks with (many) partitions (and more)

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: scalability bottlenecks with (many) partitions (and more)
Date:	2024-09-16 14:19:29
Message-ID:	0f27b64b-5bf3-4140-98b7-635e312e1796@vondra.me
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 9/16/24 15:11, Jakub Wartak wrote:
> On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
>> [..]
>
>> Anyway, at this point I'm quite happy with this improvement. I didn't
>> have any clear plan when to commit this, but I'm considering doing so
>> sometime next week, unless someone objects or asks for some additional
>> benchmarks etc.
>
> Thank you very much for working on this :)
>
> The only fact that comes to my mind is that we could blow up L2
> caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
> be like one or two 2MB huge pages more @ common max_connections=1000
> x86_64 (830kB -> ~5.1MB), and indeed:
>
> # without patch:
> postgres(at)hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
> shared_memory_size_in_huge_pages
> 177
>
> # with patch:
> postgres(at)hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
> shared_memory_size_in_huge_pages
> 178
>
> So playing Devil's advocate , the worst situation that could possibly
> hurt (?) could be:
> * memory size of PGPROC working set >> L2_cache (thus very high
> max_connections),
> * insane number of working sessions on CPU (sessions >> VCPU) - sadly
> happens to some,
> * those sessions wouldn't have to be competing for the same Oids -
> just fetching this new big fpLockBits[] structure - so probing a lot
> for lots of Oids, but *NOT* having to use futex() syscall [so not that
> syscall price]
> * no huge pages (to cause dTLB misses)
>
> then maybe(?) one could observe further degradation of dTLB misses in
> the perf-stat counter under some microbenchmark, but measuring that
> requires isolated and physical hardware. Maybe that would be actually
> noise due to overhead of context-switches itself. Just trying to think
> out loud, what big PGPROC could cause here. But this is already an
> unhealthy and non-steady state of the system, so IMHO we are good,
> unless someone comes up with a better (more evil) idea.
>

I've been thinking about such cases too, but I don't think it can really
happen in practice, because:

- How likely is it that the sessions will need a lot of OIDs, but not
the same ones? Also, why would it matter that the OIDs are not the same,
I don't think it matters unless one of the sessions needs an exclusive
lock, at which point the optimization doesn't really matter.

- If having more fast-path slots means it doesn't fit into L2 cache,
would we fit into L2 without it? I don't think so - if there really are
that many locks, we'd have to add those into the shared lock table, and
there's a lot of extra stuff to keep in memory (relcaches, ...).

This is pretty much one of the cases I focused on in my benchmarking,
and I'm yet to see any regression.

>>> I did look at docs if anything needs updating, but I don't think so. The
> SGML docs only talk about fast-path locking at fairly high level, not
> about how many we have etc.
>
> Well the only thing I could think of was to add to the
> doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
> is also used as advisory for the number of groups used in
> lockmanager's fast-path implementation" (that is, without going into
> further discussion, as even pg_locks discussion
> doc/src/sgml/system-views.sgml simply uses that term).
>

Thanks, I'll consider mentioning this in max_locks_per_transaction.
Also, I think there's a place calculating the amount of per-connection
memory, so maybe that needs to be updated too.

regards

--
Tomas Vondra

In response to

Re: scalability bottlenecks with (many) partitions (and more) at 2024-09-16 13:11:31 from Jakub Wartak

Responses

Re: scalability bottlenecks with (many) partitions (and more) at 2024-09-17 20:16:04 from Tomas Vondra
Re: scalability bottlenecks with (many) partitions (and more) at 2024-09-23 10:02:24 from Jakub Wartak

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2024-09-16 14:22:03	Re: Psql meta-command conninfo+
Previous Message	Jim Jones	2024-09-16 13:40:07	Re: Psql meta-command conninfo+