From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: slab allocator performance issues |
Date: | 2021-07-17 23:10:19 |
Message-ID: | 20210717231019.blfhyqniu6jed3ym@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-07-18 00:46:09 +0200, Tomas Vondra wrote:
> On 7/17/21 11:14 PM, Andres Freund wrote:
> > Hm. I wonder if we should just not populate the freelist eagerly, to
> > drive down the initialization cost. I.e. have a separate allocation path
> > for chunks that have never been allocated, by having a
> > SlabBlock->free_offset or such.
> >
> > Sure, it adds a branch to the allocation happy path, but it also makes the
> > first allocation for a chunk cheaper, because there's no need to get the next
> > element from the freelist (adding a likely cache miss). And it should make the
> > allocation of a new block faster by a lot.
> >
>
> Not sure what you mean by 'not populate eagerly' so can't comment :-(
Instead of populating a linked list with all chunks upon creation of a block -
which requires touching a fair bit of memory - keep a per-block pointer (or an
offset) into "unused" area of the block. When allocating from the block and
theres still "unused" memory left, use that, instead of bothering with the
freelist.
I tried that, and it nearly got slab up to the allocation/freeing performance
of aset.c (while winning after allocation, due to the higher memory density).
> > > > 4) Less of a performance, and more of a usability issue: The constant
> > > > block size strikes me as problematic. Most users of an allocator can
> > > > sometimes be used with a small amount of data, and sometimes with a
> > > > large amount.
> > > >
> > >
> > > I doubt this is worth the effort, really. The constant block size makes
> > > various places much simpler (both to code and reason about), so this should
> > > not make a huge difference in performance. And IMHO the block size is mostly
> > > an implementation detail, so I don't see that as a usability issue.
> >
> > Hm? It's something the user has to specify, so I it's not really an
> > implementation detail. It needs to be specified without sufficient
> > information, as well, since externally one doesn't know how much memory the
> > block header and chunk headers + rounding up will use, so computing a good
> > block size isn't easy. I've wondered whether it should just be a count...
> >
>
> I think this is mixing two problems - how to specify the block size, and
> whether the block size is constant (as in slab) or grows over time (as in
> allocset).
That was in response to the "implementation detail" bit solely.
> But growing the block size seems problematic for long-lived contexts with
> workloads that change a lot over time - imagine e.g. decoding many small
> transactions, with one huge transaction mixed in. The one huge transaction
> will grow the block size, and we'll keep using it forever. But in that case
> we might have just as well allocate the large blocks from the start, I
> guess.
I was thinking of capping the growth fairly low. I don't think after a 16x
growth or so you're likely to still see allocation performance gains with
slab. And I don't think that'd be too bad for decoding - we'd start with a
small initial block size, and in many workloads that will be enough, and just
workloads where that doesn't suffice will adapt performance wise. And: Medium
term I wouldn't expect reorderbuffer.c to stay the only slab.c user...
> > Why do you not think it's relevant for performance? Either one causes too much
> > memory usage by using a too large block size, wasting memory, or one ends up
> > loosing perf through frequent allocations?
>
> True. I simply would not expect this to make a huge difference - I may be
> wrong, and I'm sure there are workloads where it matters. But I still think
> it's easier to just use larger blocks than to make the slab code more
> complex.
IDK. I'm looking at using slab as part of a radix tree implementation right
now. Which I'd hope to be used in various different situations. So it's hard
to choose the right block size - and it does seem to matter for performance.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-07-18 01:06:00 | Re: slab allocator performance issues |
Previous Message | Tomas Vondra | 2021-07-17 22:46:09 | Re: slab allocator performance issues |