Re: scalability bottlenecks with (many) partitions (and more)

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Date: 2024-09-16 13:11:31
Message-ID: CAKZiRmze8L6FeAKNQhZkn4NmUH2WZZyQ46GUm-Xuky5qr5DUoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> [..]

> Anyway, at this point I'm quite happy with this improvement. I didn't
> have any clear plan when to commit this, but I'm considering doing so
> sometime next week, unless someone objects or asks for some additional
> benchmarks etc.

Thank you very much for working on this :)

The only fact that comes to my mind is that we could blow up L2
caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
be like one or two 2MB huge pages more @ common max_connections=1000
x86_64 (830kB -> ~5.1MB), and indeed:

# without patch:
postgres(at)hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
177

# with patch:
postgres(at)hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
178

So playing Devil's advocate , the worst situation that could possibly
hurt (?) could be:
* memory size of PGPROC working set >> L2_cache (thus very high
max_connections),
* insane number of working sessions on CPU (sessions >> VCPU) - sadly
happens to some,
* those sessions wouldn't have to be competing for the same Oids -
just fetching this new big fpLockBits[] structure - so probing a lot
for lots of Oids, but *NOT* having to use futex() syscall [so not that
syscall price]
* no huge pages (to cause dTLB misses)

then maybe(?) one could observe further degradation of dTLB misses in
the perf-stat counter under some microbenchmark, but measuring that
requires isolated and physical hardware. Maybe that would be actually
noise due to overhead of context-switches itself. Just trying to think
out loud, what big PGPROC could cause here. But this is already an
unhealthy and non-steady state of the system, so IMHO we are good,
unless someone comes up with a better (more evil) idea.

>> I did look at docs if anything needs updating, but I don't think so. The
SGML docs only talk about fast-path locking at fairly high level, not
about how many we have etc.

Well the only thing I could think of was to add to the
doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
is also used as advisory for the number of groups used in
lockmanager's fast-path implementation" (that is, without going into
further discussion, as even pg_locks discussion
doc/src/sgml/system-views.sgml simply uses that term).

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-09-16 13:29:21 Re: Fix some ubsan/asan related issues
Previous Message Andrei Lepikhov 2024-09-16 12:44:12 Re: POC, WIP: OR-clause support for indexes