Re: cost based vacuum (parallel)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-06 03:51:28
Message-ID: 20191106035128.GR6962@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Amit Kapila (amit(dot)kapila16(at)gmail(dot)com) wrote:
> On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > That's quite doable independent of parallelism, as we don't have tables
> > > or indexes spanning more than one tablespace. True, you could then make
> > > the processing of an individual vacuum faster by allowing to utilize
> > > multiple tablespace budgets at the same time.
> >
> > Yes, it's possible to do independent of parallelism, but what I was
> > trying to get at above is that it might not be worth the effort. When
> > it comes to parallel vacuum though, I'm not sure that you can just punt
> > on this question since you'll naturally end up spanning multiple
> > tablespaces concurrently, at least if the heap+indexes are spread across
> > multiple tablespaces and you're operating against more than one of those
> > relations at a time
>
> Each parallel worker operates on a separate index. It might be worth
> exploring per-tablespace vacuum throttling, but that should not be a
> requirement for the currently proposed patch.

Right, that each operates on a separate index in parallel is what I had
figured was probably happening, and that's why I brought up the question
of "well, what does IO throttling mean when you've got multiple
tablespaces involved with presumably independent IO channels...?" (or,
at least, that's what I was trying to go for).

This isn't a question with the current system and way the code works
within a single vacuum operation, as we're never operating on more than
one relation concurrently in that case.

Of course, we don't currently do anything to manage IO utilization
across tablespaces when there are multiple autovacuum workers running
concurrently, which I suppose goes to Andres' point that we aren't
really doing anything to deal with this today and therefore this is
perhaps not all that new of an issue just with the addition of
parallel vacuum. I'd still argue that it becomes a lot more apparent
when you're talking about one parallel vacuum, but ultimately we should
probably be thinking about how to manage the resources across all the
vacuums and tablespaces and queries and such.

In an ideal world, we'd track the i/o from front-end queries, have some
idea of the total i/o possible for each IO channel, and allow vacuum and
whatever other background processes need to run to scale up and down,
with enough buffer to avoid ever being maxed out on i/o, but keeping up
a consistent rate of i/o that lets everything finish as quickly as
possible.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2019-11-06 03:56:44 Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"
Previous Message Tom Lane 2019-11-06 03:21:40 Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )