Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Date: 2023-11-10 04:46:29
Message-ID: CAFiTN-v3b3mkBx2Hgg76xnOhRjgKOzSkoGtbC1yUyZt+S0qz3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 9, 2023 at 9:39 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Nov 8, 2023 at 6:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > Here is the updated version of the patch, here I have taken the
> > approach suggested by Andrey and I discussed the same with Alvaro
> > offlist and he also agrees with it. So the idea is that we will keep
> > the bank size fixed which is 16 buffers per bank and the allowed GUC
> > value for each slru buffer must be in multiple of the bank size. We
> > have removed the centralized lock but instead of one lock per bank, we
> > have kept the maximum limit on the number of bank locks which is 128.
> > We kept the max limit as 128 because, in one of the operations (i.e.
> > ActivateCommitTs), we need to acquire all the bank locks (but this is
> > not a performance path at all) and at a time we can acquire a max of
> > 200 LWlocks, so we think this limit of 128 is good. So now if the
> > number of banks is <= 128 then we will be using one lock per bank
> > otherwise the one lock may protect access of buffer in multiple banks.
>
> Just so I understand, I guess this means that an SLRU is limited to
> 16*128 = 2k buffers = 16MB?

Not really, because 128 is the maximum limit on the number of bank
locks not on the number of banks. So for example, if you have 16*128
= 2k buffers then each lock will protect one bank, and likewise when
you have 16 * 512 = 8k buffers then each lock will protect 4 banks.
So in short we can get the lock for each bank by simple computation
(banklockno = bankno % 128 )

> When we were talking about this earlier, I suggested fixing the number
> of banks and allowing the number of buffers per bank to scale
> depending on the setting. That seemed simpler than allowing both the
> number of banks and the number of buffers to vary, and it might allow
> the compiler to optimize some code better, by converting a calculation
> like page_no%number_of_banks into a masking operation like page_no&0xf
> or whatever. However, because it allows an individual bank to become
> arbitrarily large, it more or less requires us to use a buffer mapping
> table. Some of the performance problems mentioned could be alleviated
> by omitting the hash table when the number of buffers per bank is
> small, and we could also create the dynahash with a custom hash
> function that just does modular arithmetic on the page number rather
> than a real hashing operation. However, maybe we don't really need to
> do any of that. I agree that dynahash is clunky on a good day. I
> hadn't realized the impact would be so noticeable.

Yes, so one idea is that we keep the number of banks fixed and with
that, as you pointed out correctly with a large number of buffers, the
bank size can be quite big and for that, we would need a hash table
and OTOH what I am doing here is keeping the bank size fixed and
smaller (16 buffers per bank) and with that we can have large numbers
of the bank when the buffer pool size is quite big. But I feel having
more banks is not really a problem if we grow the number of locks
beyond a certain limit as in some corner cases we need to acquire all
locks together and there is a limit on that. So I like this idea of
sharing locks across the banks with that 1) We can have enough locks
so that lock contention or cache invalidation due to a common lock
should not be a problem anymore 2) We can keep a small bank size with
that seq search within the bank is quite fast so reads are fast 3)
With small bank size victim buffer search which has to be sequential
is quite fast.

> This proposal takes the opposite approach of fixing the number of
> buffers per bank, letting the number of banks vary. I think that's
> probably fine, although it does reduce the effective associativity of
> the cache. If there are more hot buffers in a bank than the bank size,
> the bank will be contended, even if other banks are cold. However,
> given the way SLRUs are accessed, it seems hard to imagine this being
> a real problem in practice. There aren't likely to be say 20 hot
> buffers that just so happen to all be separated from one another by a
> number of pages that is a multiple of the configured number of banks.
> And in the seemingly very unlikely event that you have a workload that
> behaves like that, you could always adjust the number of banks up or
> down by one, and the problem would go away. So this seems OK to me.

I agree with this

> I also agree with a couple of points that Alvaro made, specifically
> that (1) this doesn't have to be perfect, just better than now and (2)
> separate GUCs for each SLRU is fine. On the latter point, it's worth
> keeping in mind that the cost of a GUC that most people don't need to
> tune is fairly low. GUCs like work_mem and shared_buffers are
> "expensive" because everybody more or less needs to understand what
> they are and how to set them and getting the right value can tricky --
> but a GUC like autovacuum_naptime is a lot cheaper, because almost
> nobody needs to change it. It seems to me that these GUCs will fall
> into the latter category. Users can hopefully just ignore them except
> if they see a contention on the SLRU bank locks -- and then they can
> consider increasing the number of banks for that particular SLRU. That
> seems simple enough. As with autovacuum_naptime, there is a danger
> that people will configure a ridiculous value of the parameter for no
> good reason and get bad results, so it would be nice if someday we had
> a magical system that just got all of this right without the user
> needing to configure anything. But in the meantime, it's better to
> have a somewhat manual system to relieve pressure on these locks than
> no system at all.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-11-10 04:47:49 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Previous Message Michael Paquier 2023-11-10 04:37:00 Re: Add recovery to pg_control and remove backup_label