From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Vacuum: allow usage of more than 1GB of work mem |
Date: | 2016-09-06 16:58:25 |
Message-ID: | CAGTBQpZTUbsSM7GYxoqVkiuPkQR3xj0+KwhUXArg6mMmco88dw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Sep 4, 2016 at 8:10 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 9/4/16 1:46 AM, Simon Riggs wrote:
>>>
>>> > The patch also makes vacuum free the dead_tuples before starting
>>> > truncation. It didn't seem necessary to hold onto it beyond that
>>> > point, and it might help give the OS more cache, especially if work
>>> > mem is configured very high to avoid multiple index scans.
>>
>> How long does that part ever take? Is there any substantial gain from
>> this?
>
>
> If you're asking about how long the dealloc takes, we're going to have to
> pay that cost anyway when the context gets destroyed/reset, no? Doing that
> sooner rather than later certainly seems like a good idea since we've seen
> that truncation can take quite some time. Might as well give the memory back
> to the OS ASAP.
AFAIK, except on debug builds where it has to memset the whole thing,
the cost is constant (unrelated to the allocated block size), so it
should be rather small in this context.
On Tue, Sep 6, 2016 at 1:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Sep 3, 2016 at 8:55 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> The attached patch allows setting maintainance_work_mem or
>> autovacuum_work_mem higher than 1GB (and be effective), by turning the
>> allocation of the dead_tuples into a huge allocation.
>>
>> This results in fewer index scans for heavily bloated tables, and
>> could be a lifesaver in many situations (in particular, the situation
>> I'm living right now in production, where we don't have enough room
>> for a vacuum full, and have just deleted 75% of a table to make room
>> but have to rely on regular lazy vacuum to free the space).
>>
>> The patch also makes vacuum free the dead_tuples before starting
>> truncation. It didn't seem necessary to hold onto it beyond that
>> point, and it might help give the OS more cache, especially if work
>> mem is configured very high to avoid multiple index scans.
>>
>> Tested with pgbench scale 4000 after deleting the whole
>> pgbench_accounts table, seemed to work fine.
>
> The problem with this is that we allocate the entire amount of
> maintenance_work_mem even when the number of actual dead tuples turns
> out to be very small. That's not so bad if the amount of memory we're
> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
> to remove the 1 GB limit, because somebody might have
> maintenance_work_mem set to tens or hundreds of gigabytes to speed
> index creation, and allocating that much space for a VACUUM that
> encounters 1 dead tuple does not seem like a good plan.
>
> What I think we need to do is make some provision to initially
> allocate only a small amount of memory and then grow the allocation
> later if needed. For example, instead of having
> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
> ItemPointer * and allocate the array progressively in segments. I'd
> actually argue that the segment size should be substantially smaller
> than 1 GB, like say 64MB; there are still some people running systems
> which are small enough that allocating 1 GB when we may need only 6
> bytes can drive the system into OOM.
This would however incur the cost of having to copy the whole GB-sized
chunk every time it's expanded. It woudln't be cheap.
I've monitored the vacuum as it runs and the OS doesn't map the whole
block unless it's touched, which it isn't until dead tuples are found.
Surely, if overcommit is disabled (as it should), it could exhaust the
virtual address space if set very high, but it wouldn't really use the
memory unless it's needed, it would merely reserve it.
To fix that, rather than repalloc the whole thing, dead_tuples would
have to be an ItemPointer** of sorted chunks. That'd be a
significantly more complex patch, but at least it wouldn't incur the
memcpy. I could attempt that, but I don't see the difference between
vacuum and create index in this case. Both could allocate a huge chunk
of the virtual address space if maintainance work mem says so, both
proportional to the size of the table. I can't see how that could take
any DBA by surprise.
A sensible compromise could be dividing the maintainance_work_mem by
autovacuum_max_workers when used in autovacuum, as is done for cost
limits, to protect those that set both rather high.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-09-06 17:00:45 | Re: (Comment)Bug in CteScanNext |
Previous Message | Heikki Linnakangas | 2016-09-06 16:57:28 | Re: GiST penalty functions [PoC] |