Re: (auto)vacuum truncate exclusive lock

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: (auto)vacuum truncate exclusive lock
Date: 2013-04-12 18:42:51
Message-ID: 1365792171.53572.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[some relevant dropped bits of the thread restored]

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>>>> I propose to do the following:

>>>> (1)  Restore the prior behavior of the VACUUM command.  This
>>>> was only ever intended to be a fix for a serious autovacuum
>>>> problem which caused many users serious performance problems

>>>> (2)  If autovacuum decides to try to truncate but the lock
>>>> cannot be initially acquired, and analyze is requested, skip
>>>> the truncation and do the autoanalyze.

>>> I think that the minimum appropriate fix here is to [...] take
>>> out the suppression of stats reporting and analysis.
>>
>> I'm not sure I understand -- are you proposing that is all we do
>> for both the VACUUM command and autovacuum?
>
> No, I said that was the minimum fix.

OK, I suggested that and more, so I wasn't sure what you were
getting at.

>>>>> OK, I see that now.  In the old behavior, of the lock was
>>>>> acquired, but then we were shoved off from it, the analyze
>>>>> was not done.  But, in the old behavior if the lock was never
>>>>> acquired at all, then it would go ahead to do the
>>>>> autoanalyze,

>>>> Ah, I see now.  So the actual worst case for the old code, in
>>>> terms of both head-banging and statistics, was if autovacuum
>>>> was able to acquire the lock but then many tasks all piled up
>>>> behind its lock.  If the system was even *more* busy it would
>>>> not acquire the lock at all, and would behave better.

> and I suppose the rationale for suppressing the stats report was
> this same idea of lying to the stats collector in order to
> encourage a new vacuum attempt to happen right away.

I think Jan expressed some such sentiment back during the original
discussion.  I was not persuaded by that; but he pointed out that
if the deadlock killer killed an autovacuum process which was doing
a truncate, the it did not get to the statistics phase; so I agreed
that any change in that behavior should be a separate patch.  I
missed the fact that if it failed to initially get the lock it did
proceed to the statistics phase.  I explained this earlier in this
thread.  No need to cast about for hypothetical explanations.

> Now I'm not sure that that's a good idea at all

I'm pretty sure it isn't; that's why I proposed changing it.

> But if it is reasonable, we need a redesign of the reporting
> messages, not just a hack to not tell the stats collector what we
> did.

The idea was to try to make as small a change in previous behavior
as possible.  Jan pointed out that when the deadlock detection code
killed an autovacuum worker which was trying to truncate, the
statistics were not updated, leading to retries.  This was an
attempt to *not change* existing behavior.  It was wrong, because
we both missed the fact that if it didn't get the lock in the first
place it went ahead with statistics generation.  That being the
case, I was proposing we always generate statistics if we were
supposed to.  That would be a change toward *more* up-to-date
statistics and *fewer* truncation retries than we've had.  I'm OK
with that because a table hot enough to hit the issue will likely
need the space again or need another vacuum soon.

> Are you saying you intend to revert that whole concept?

No.  I was merely asking what you were suggesting.  As I said
earlier:

>>>> I have seen cases where the old logic head-banged for hours or
>>>> days without succeeding at the truncation attempt in
>>>> autovacuum, absolutely killing performance until the user ran
>>>> an explicit VACUUM.  And in the meantime, since the deadlock
>>>> detection logic was killing autovacuum before it got to the
>>>> analyze phase, the autoanalyze was never done.

> Otherwise we need some thought about how to inform the stats
> collector what's really happening.

I think we can probably improve that on some future release.  I
don't think a new scheme for that makes sense for back-patching or
9.3.

For now what I'm suggesting is generating statistics in all the
cases it did before, plus the case where it starts truncation but
does not complete it.  The fact that before this patch there were
cases where the autovacuum worker was killed, resulting in not
generating needed statistics seems like a bug, not a behavior we
need to preserve.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2013-04-12 18:44:47 Re: [sepgsql 2/3] Add db_schema:search permission checks
Previous Message Tom Lane 2013-04-12 18:31:21 Re: Detach/attach table and index data files from one cluster to another