Re: Thoughts about NUM_BUFFER_PARTITIONS

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts about NUM_BUFFER_PARTITIONS
Date: 2024-08-04 19:32:20
Message-ID: 1B6B9FE6-8B88-4043-A1B0-824B8EEF6785@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One of our customers recently asked me to look into buffer mapping.
Following is my POV on the problem of optimal NUM_BUFFER_PARTITIONS.

I’ve found some dead code: BufMappingPartitionLockByIndex() is unused, and unused for a long time. See patch 1.

> On 23 Feb 2024, at 22:25, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Well, if Postgres Pro implements this, I don't know what their reasoning
> was exactly, but I guess they wanted to make it easier to experiment
> with different values (without rebuild), or maybe they actually have
> systems where they know higher values help ...
>
> Note: I'd point the maximum value 8 translates to 256, so no - it does
> not max at the same value as PostgreSQL.

I’ve prototyped similar GUC for anyone willing to do such experiments. See patch 2, 4. Probably, I’ll do some experiments too, on customer's clusters and workloads :)

> Anyway, this value is inherently a trade off. If it wasn't, we'd set it
> to something super high from the start. But having more partitions of
> the lock table has a cost too, because some parts need to acquire all
> the partition locks (and that's O(N) where N = number of partitions).

I’ve found none such cases, actually. Or, perhaps, I was not looking good enough.
pg_buffercache iterates over buffers and releases locks. See patch 3 to fix comments.

> Of course, not having enough lock table partitions has a cost too,
> because it increases the chance of conflict between backends (who happen
> to need to operate on the same partition). This constant is not
> constant, it changes over time - with 16 cores the collisions might have
> been rare, with 128 not so much. Also, with partitioning we may need
> many more locks per query.
>
> This means it's entirely possible it'd be good to have more than 128
> partitions of the lock table, but we only change this kind of stuff if
> we have 2 things:
>
> 1) clear demonstration of the benefits (e.g. a workload showing an
> improvement with higher number of partitions)
>
> 2) analysis of how this affects other workloads (e.g. cases that may
> need to lock all the partitions etc)
>
> Ultimately it's a trade off - we need to judge if the impact in (2) is
> worth the improvement in (1).
>
> None of this was done in this thread. There's no demonstration of the
> benefits, no analysis of the impact etc.
>
> As for turning the parameter into a GUC, that has a cost too. Either
> direct - a compiler can do far more optimizations with compile-time
> constants than with values that may change during execution, for
> example.

I think overhead of finding partition by hash is negligible small.
num_partitions in HTAB controls number of freelists. This might have some effect.

> Or indirect - if we can't give users any guidance how/when to
> tune the GUC, it can easily lead to misconfiguration (I can't even count
> how many times I had to deal with systems where the values were "tuned"
> following the logic that more is always better).

Yes, this argument IMHO is most important. By doing more such knobs we promote superstitious approach to tuning.

Best regards, Andrey Borodin.

Attachment Content-Type Size
v0-0001-Remove-unused-functions-in-buf_internals.h.patch application/octet-stream 1.2 KB
v0-0002-GUCify-NUM_BUFFER_PARTITIONS.patch application/octet-stream 5.8 KB
v0-0003-Remove-reference-to-pg_buffercache-near-MAX_SIMUL.patch application/octet-stream 987 bytes
v0-0004-Adjust-dshash.c-comment-about-NUM_BUFFER_PARTITIO.patch application/octet-stream 906 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Wienhold 2024-08-04 21:23:03 Re: psql: Add leakproof field to \dAo+ meta-command results
Previous Message Ilya Gladyshev 2024-08-04 18:19:57 Re: optimizing pg_upgrade's once-in-each-database steps