Re: scalability bottlenecks with (many) partitions (and more)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Date: 2024-09-03 15:06:32
Message-ID: CA+TgmobCBse3LYj9pfh-ELbP0VnffQS7jP_vNTGA7-KZP0gqDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> The one argument to not tie this to max_locks_per_transaction is the
> vastly different "per element" memory requirements. If you add one entry
> to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
> OTOH one fast-path entry is ~5B, give or take. That's a pretty big
> difference, and it if the locks fit into the shared lock table, but
> you'd like to allow more fast-path locks, having to increase
> max_locks_per_transaction is not great - pretty wastefull.
>
> OTOH I'd really hate to just add another GUC and hope the users will
> magically know how to set it correctly. That's pretty unlikely, IMO. I
> myself wouldn't know what a good value is, I think.
>
> But say we add a GUC and set it to -1 by default, in which case it just
> inherits the max_locks_per_transaction value. And then also provide some
> basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

Doing some worst case math, suppose somebody has max_connections=1000
(which is near the upper limit of what I'd consider a sane setting)
and max_locks_per_transaction=10000 (ditto). The product is 10
million, so every 10 bytes of storage each a gigabyte of RAM. Chewing
up 15GB of RAM when you could have chewed up only 0.5GB certainly
isn't too great. On the other hand, those values are kind of pushing
the limits of what is actually sane. If you imagine
max_locks_per_transaction=2000 rather than
max_locks_per_connection=10000, then it's only 3GB and that's
hopefully not a lot on the hopefully-giant machine where you're
running this.

> I think just knowing the "hit ratio" would be enough, i.e. counters for
> how often it fits into the fast-path array, and how often we had to
> promote it to the shared lock table would be enough, no?

Yeah, probably. I mean, that won't tell you how big it needs to be,
but it will tell you whether it's big enough.

I wonder if we should be looking at further improvements in the lock
manager of some kind. For instance, imagine if we allocated storage
via DSM or DSA for cases where we need a really large number of Lock
entries. The downside of that is that we might run out of memory for
locks at runtime, which would perhaps suck, but you'd probably use
significantly less memory on average. Or, maybe we need an even bigger
rethink where we reconsider the idea that we take a separate lock for
every single partition instead of having some kind of hierarchy-aware
lock manager. I don't know. But this feels like very old, crufty tech.
There's probably something more state of the art that we could or
should be doing.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xing Guo 2024-09-03 15:18:39 Re: JIT: The nullness of casetest.value can be determined at the JIT compile time.
Previous Message Andrei Lepikhov 2024-09-03 14:49:31 Re: query_id, pg_stat_activity, extended query protocol