Quick Links

Re: RFC: Improve CPU cache locality of syscache searches

From:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: RFC: Improve CPU cache locality of syscache searches
Date:	2021-08-05 16:27:49
Message-ID:	CAFBsxsGkBtEVjjMLZcRQqKxUCZBauoiLBPmH3X-EDSSWd__Yug@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 4, 2021 at 3:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > typedef struct cc_bucket
> > {
> > uint32 hashes[4];
> > catctup *ct[4];
> > dlist_head;
> > };
>
> I'm not convinced that the above the right idea though. Even if the hash
> matches, you're still going to need to fetch at least catctup->keys[0]
from
> a separate cacheline to be able to return the cache entry.

I see your point. It doesn't make sense to inline only part of the
information needed.

> struct cc_bucket_1
> {
> uint32 hashes[3]; // 12
> // 4 bytes alignment padding
> Datum key0s[3]; // 24
> catctup *ct[3]; // 24
> // cacheline boundary
> dlist_head conflicts; // 16
> };
>
> would be better for 1 key values?
>
> It's obviously annoying to need different bucket types for different key
> counts, but given how much 3 unused key Datums waste, it seems worth
paying
> for?

Yeah, it's annoying, but it does make a big difference to keep out unused
Datums:

keys cachelines
3 values 4 values

1 1 1/4 1 1/2
2 1 5/8 2
3 2 2 1/2
4 2 3/8 3

Or, looking at it another way, limiting the bucket size to 2 cachelines, we
can fit:

keys values
1 5
2 4
3 3
4 2

Although I'm guessing inlining just two values in the 4-key case wouldn't
buy much.

> If we stuffed four values into one bucket we could potentially SIMD the
hash
> and Datum comparisons ;)

;-) That's an interesting future direction to consider when we support
building with x86-64-v2. It'd be pretty easy to compare a vector of hashes
and quickly get the array index for the key comparisons (ignoring for the
moment how to handle the rare case of multiple identical hashes).
However, we currently don't memcmp() the Datums and instead call an
"eqfast" function, so I don't see how that part would work in a vector
setting.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Re: RFC: Improve CPU cache locality of syscache searches at 2021-08-04 19:44:44 from Andres Freund

Responses

Re: RFC: Improve CPU cache locality of syscache searches at 2021-08-05 20:12:01 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Platon Pronko	2021-08-05 16:48:13	Re: very long record lines in expanded psql output
Previous Message	Andrew Dunstan	2021-08-05 16:26:42	Re: very long record lines in expanded psql output