Quick Links

Re: slab allocator performance issues

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject:	Re: slab allocator performance issues
Date:	2022-12-05 08:02:06
Message-ID:	CAApHDvoUN6Kh3KBrybprL0ApLuGe_oBfjqYGW_Dff21VzZDw4Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 11 Nov 2022 at 22:20, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Wed, Oct 12, 2022 at 4:37 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > [v3]
>
> + /*
> + * Compute a shift that guarantees that shifting chunksPerBlock with it
> + * yields is smaller than SLAB_FREELIST_COUNT - 1 (one freelist is used for full blocks).
> + */
> + slab->freelist_shift = 0;
> + while ((slab->chunksPerBlock >> slab->freelist_shift) >= (SLAB_FREELIST_COUNT - 1))
> + slab->freelist_shift++;
>
> + /*
> + * Ensure, without a branch, that index 0 is only used for blocks entirely
> + * without free chunks.
> + * XXX: There probably is a cheaper way to do this. Needing to shift twice
> + * by slab->freelist_shift isn't great.
> + */
> + index = (freecount + (1 << slab->freelist_shift) - 1) >> slab->freelist_shift;
>
> How about something like
>
> #define SLAB_FREELIST_COUNT ((1<<3) + 1)
> index = (freecount & (SLAB_FREELIST_COUNT - 2)) + (freecount != 0);

Doesn't this create a sort of round-robin use of the free list? What
we want is a sort of "histogram" bucket set of free lists so we can
group together blocks that have a close-enough free number of chunks.
Unless I'm mistaken, I think what you have doesn't do that.

I wondered if simply:

index = -(-freecount >> slab->freelist_shift);

would be faster than Andres' version. I tried it out and on my AMD
machine, it's about the same speed. Same on a Raspberry Pi 4.

Going by [2], the instructions are very different with each method, so
other machines with different latencies on those instructions might
show something different. I attached what I used to test if anyone
else wants a go.

AMD Zen2
$ ./freecount 2000000000
Test 'a' in 0.922766 seconds
Test 'd' in 0.922762 seconds (0.000433% faster)

RPI4
$ ./freecount 2000000000
Test 'a' in 3.341350 seconds
Test 'd' in 3.338690 seconds (0.079672% faster)

That was gcc. Trying it with clang, it went in a little heavy-handed
and optimized out my loop, so some more trickery might be needed for a
useful test on that compiler.

David

[2] https://godbolt.org/z/dh95TohEG

Attachment	Content-Type	Size
freecount.c	text/plain	1.4 KB

In response to

Re: slab allocator performance issues at 2022-11-11 09:20:29 from John Naylor

Responses

Re: slab allocator performance issues at 2022-12-05 10:18:24 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2022-12-05 08:04:58	Re: pg_dump: Remove "blob" terminology
Previous Message	Michael Paquier	2022-12-05 07:44:13	Re: Generate pg_stat_get_* functions with Macros