Re: autovacuum scheduling starvation and frenzy

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum scheduling starvation and frenzy
Date: 2014-06-23 22:18:19
Message-ID: CAMkU=1yxGsaGMLw=iht1p2WEF9imqW_Tr8toqaW12aAPsGYn5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 15, 2014 at 4:06 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, May 15, 2014 at 12:55 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
>>
>> Jeff Janes wrote:
>>
>> > If you have a database with a large table in it that has just passed
>> > autovacuum_freeze_max_age, all future workers will be funnelled into
>> > that
>> > database until the wrap-around completes. But only one of those workers
>> > can actually vacuum the one table which is holding back the frozenxid.
>> > Maybe the 2nd worker to come along will find other useful work to do,
>> > but
>> > eventually all the vacuuming that needs doing is already in progress,
>> > and
>> > so each worker starts up, gets directed to this database, finds it can't
>> > help, and exits. So all other databases are entirely starved of
>> > autovacuuming for the entire duration of the wrap-around vacuuming of
>> > this
>> > one large table.
>>
>> Bah. Of course :-(
>>
>> Note that if you have two databases in danger of wraparound, the oldest
>> will always be chosen until it's no longer in danger. Ignoring the
>> second one past freeze_max_age seems bad also.
>
>
> I'm not sure how bad that is. If you really do want to get the frozenxid
> advanced as soon as possible, it makes sense to focus on one at a time,
> rather than splitting the available IO throughput between two of them. So I
> wouldn't go out of my way to enable two to run at the same time, nor go out
> of my way to prevent it.
>
> If most wrap around scans were done as part of a true emergency it would
> make sense to forbid all other vacuums (but only if you also automatically
> disabled autovacuum_vacuum_cost_delay as part of the emergency) so as not to
> divide up the IO throughput. But most are not emergencies, as 200,000,000
> is a long way from 2,000,000,000.
>
>
>>
>>
>> This code is in autovacuum.c, do_start_worker(). Not sure what does
>> your proposal look like in terms of code.
>
>
> I wasn't sure either, I was mostly trying the analyze the situation. But I
> decided just moving the "skipit" chunk of code to above the wrap-around code
> might work for experimental purposes, as attached. It has been running for
> a few of hours that way and I no longer see the frenzies occurring whenever
> pgbench_history gets vacuumed..

I didn't add this patch to the commitfest, because it was just a point
for discussion and not actually proposed for application. But It
doesn't seem to have provoked much discussion either.

Should I go add this to the next commitfest?

I do see it listed as a resolved item in
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

But I can't find a commit that would resolve it, so does that mean the
resolution was that the behavior was not new in 9.4 and so didn't need
to be fixed for it?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-06-23 22:27:27 Re: pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic
Previous Message Robert Haas 2014-06-23 22:09:09 PostgreSQL for VAX on NetBSD/OpenBSD