Re: Speeding up an in-progress wraparound-preventing vacuum

From: Vincent de Phily <vincent(dot)dephily(at)mobile-devices(dot)fr>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Speeding up an in-progress wraparound-preventing vacuum
Date: 2014-12-09 00:58:11
Message-ID: 3303554.B2OmElZI50@moltowork
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Monday 08 December 2014 10:17:37 Jeff Janes wrote:
> On Mon, Dec 8, 2014 at 4:54 AM, Vincent de Phily <
>
> > bloat, which I'd like to get back asap). Currently about 80% of the IO is
> > devoted to the vacuum process (on average throughout the day, as
> > extrapolated
> > from atop output).
>
> Is that 80% of the actually occurring IO, or 80% of the maximum possible IO?

80% of max possible IO, although I'm not sure how atop determines what the max
is. It's a fairly reliable metric of "the drive is a busy as can be" in my
experience.

> Increasing maintenance_work_mem even further, at least temporarily and
> locally for this operation, might be a good idea.

Ok, I thought 512M was already overgenerous, but I'll try increasing that too.

> > My first question is: is there a way to speedup the currently-running
> > vacuum
> > process ? I guess killing it to let it be replaced by a more agressively-
> > configured one would work, but I'd hate to lose 2 weeks of processing (is
> > there a way to estimate how much more vacuuming work remains to be done
> > ?),
> > and I'm being a bit more cautious with the wraparound-preventing kind.
>
> I don't know of a way to speed it up gracefully. That has frustrated me a
> few times, and a solution would really be nice.
>
> If you kill it, the new process will have to re-read the entire table, but
> it will have much less work to do since the killed process already 'plowed
> a path' for it. In a sense, killing the process will throw away all of the
> sequential read work on the table that has already been done, but the index
> reads and the writing workload is not all lost, it will save the new
> process time on those.

Ah good point, even if it has to re-read all the pages, it'll only have to
write fozenxid for the pages that weren't processed before, that's good to
hear.

Isn'there also something about vacuum marking a page as "all empty", so that
it can be skiped by the next run ?

I don't get what index read is saved between vacuum runs ? I have 64G of RAM
on this box, so there's no hope of the index staying in memory.

> You can `strace` for the lseek command to see which file handles it is
> currently working on, and
> use lsof to turn those into names. You want to look at where it is in the
> table files, not the index files.

Neat trick, thanks. I'll try it in the morning when I'm fresh. Are the files
read in C-sorting order ?

> > Second question is: how come we reached the wraparound threshold on this
> > table
> > at all ? We've only been keeping 28 days of data in this table, doing
> > incremental deletes every day (there are no updates at all). I find it
> > very
> > unlikely that we'd go through 2M transactions in that timespan (that'd
> > need
> > 890 transactions per second, we're well below 100). The
> > pg_class.relfozenxid
> > on that table is at 680M, while most other tables are around 860M. Could
> > it be
> > that routine vacuums haven't been able to update the relfrozenxid in a
> > long
> > long time, or am I missing something else ?
>
> PostgreSQL doesn't know what your delete routine is like. It has to verify
> with its own eyes that there are no rows over a certain age.

My assumption was that routine vacuuming was able to update the table's
relfroxenxid, but I realize now that PG needs to do a full scan before
updating that.

> I don't think that routine vacuums even attempts to update relfrozenxid, or
> at least doesn't try very hard.

AFAIU it does (the 6th paragraph in 23.1.4 seems to imply it), and
vacuum_freeze_min_age controls when. Perhaps lowering that value would help
prepare a lot of the anti-wraparound work.

Pity there's no "frozen pages map" (or is there ?) to keep track of pages will
all-frozen tuples, it could speed up that anti-wraparound vacuum.

> Are you sure that routine vacuums have been running to completion on this
> table, as opposed to getting interrupted by something before finishing each
> time?

I checked pg_stat_user_tables.last_{auto_,}vacuum for this table and they are
both null. This is seriously worrying. I've seen autovacuum take a few days on
this table but always assumed that it finished properly. And I'm pretty sure
vacuuming does some work, otherwise my disk usage woul grow much faster. I
have no idea what could cause vacuuming to systematically stop before the end.
Maybe I'll find something in the logs.

Thanks for your answers.

--
Vincent de Phily

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andy Colson 2014-12-09 01:22:59 Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Previous Message John R Pierce 2014-12-08 23:22:02 Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)