Re: BUG #14231: logical replication wal sender process spins when using error traps in function

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, blake(at)rcmail(dot)com
Subject: Re: BUG #14231: logical replication wal sender process spins when using error traps in function
Date: 2016-07-19 16:22:13
Message-ID: 613a6533-1f20-d8eb-c852-414fd159c758@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 07/19/2016 03:35 PM, Tomas Vondra wrote:
> On 07/19/2016 07:27 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2016-07-19 07:04:43 +0200, Tomas Vondra wrote:
>>> So, I've spent a few hours experimenting with this idea, and I
>>> believe it seems like a good way forward. Attached is an early WIP
>>> patch that does this:
>>
>> Cool! Thanks for looking into this.
>>
>>
>>> I'm not sure whether the SlabContext is needed here - perhaps
>>> AllocSets would do equally well, at least in this case. I however
>>> wonder whether the existing local slab cache can actually improve
>>> anything, because what I see is a long sequence of pallocs()
>>> followed by a long sequence of pfrees(), which makes any reuse
>>> impossible. But I guess there are other cases where the palloc()
>>> and pfree() calls are mixed.
>>
>> In something more oltp like you'll see exactly that. IIRC, in
>> pgbench, we pretty much never allocate for the slab allocated stuff,
>> after some warmup.
>
> Yeah. Haven't done much testing at night. I think we'll need a set of
> test cases to validate the changes. We certainly don't want to improve
> one case and regress two others.
>
>>
>>> The other thing is that splitting the tuple context into two parts
>>> seems a bit wasteful - this mostly follows the MaxHeapTupleSize
>>> idea from the existing slab code, but how many tuples actually are
>>> this big?
>>
>> The issue is that allocating them differently sized leads to massive
>> memory fragmentation over time.
>>
>
> Sure, in all allocators we basically trade space and time efficiency. If
> we could get some apriori some estimate of the tuple size, we could
> probably tune this. It's just that ~8kB per tuple seems a bit too high.

[one huge cup of coffee later ...]

So, I think it's actually quite simple to improve this by using a
generational design, i.e. creating a sequence of slab contexts every N
tuples, using the average tuple length from the previous batch as input
for the next one (multiplied by 1.5x to give a bit of slack). Maybe this
heuristics is far too simple, but perhaps we can

Attached is a v2 of the patch doing pretty much this - I wouldn't call
this even half-baked, it's a rather early PoC thingy. But it seems to
work, and for the test case from the initial report, it significantly
reduces memory consumption by a factor of 100x, as the average tuple
length is ~80B, not 8kB. It increases the runtime a bit (say, from 700ms
to 800ms for the 100k test case), but that seems negligible compared to
the master behavior and worth considering the memory consumption reduction.

This relies on one of the SlabContext features - it can easily see which
blocks are free because it has 'free bitmap' in each block, so old
contexts get automatically freed once the tuples are pfreed. I have not
made any attempt to free the context structure itself, which is a memory
leak, but I feel lazy and it's good enough for a PoC.

The AllocSet obviously can't do that, so that part is not part of the
generation design - it would just prevent reuse of the memory, thus
increasing memory consumption. But if we could make the AllocSet capable
of doing AllocSetIsEmpty reasonably (and not just when it's reset, which
is pretty useless I believe), we could discard the whole context (making
the generational design viable).

Another SlabContext detail worth mentioning is that it actually makes
the chunk header larger, by plugging in an additional pointer (to the
block). In the end this is what allows in-block free bitmap etc. and is
more than compensated for by not doing doubling the chunk size etc.

I don't expect to have much time to work on this in the next few weeks,
so if anyone wants to grab this and make it a proper patch, be my guest.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
slab-reorder-buffer-v2.patch binary/octet-stream 29.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2016-07-19 16:28:34 Re: BUG #14231: logical replication wal sender process spins when using error traps in function
Previous Message amdecreme 2016-07-19 14:35:20 BUG #14261: Hanged randomly in hash_seq_search