Re: cost based vacuum (parallel)

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-18 06:40:59
Message-ID: CA+fd4k4T2udSkcDWKix1s18bKMVworsRXm0ZAujtQ7tJk0XAUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 15 Nov 2019 at 11:54, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Nov 13, 2019 at 10:02 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > I've done some tests while changing shared buffer size, delays and
> > number of workers. The overall results has the similar tendency as the
> > result shared by Dilip and looks reasonable to me.
> >
>
> Thanks, Sawada-san for repeating the tests. I can see from yours,
> Dilip and Mahendra's testing that the delay is distributed depending
> upon the I/O done by a particular worker and the total I/O is also as
> expected in various kinds of scenarios. So, I think this is a better
> approach. Do you agree or you think we should still investigate more
> on another approach as well?
>
> I would like to summarize this approach. The basic idea for parallel
> vacuum is to allow the parallel workers and master backend to have a
> shared view of vacuum cost related parameters (mainly
> VacuumCostBalance) and allow each worker to update it and then based
> on that decide whether it needs to sleep. With this basic idea, we
> found that in some cases the throttling is not accurate as explained
> with an example in my email above [1] and then the tests performed by
> Dilip and others in the following emails (In short, the workers doing
> more I/O can be throttled less). Then as discussed in an email later
> [2], we tried a way to avoid letting the workers sleep which has done
> less or no I/O as compared to other workers. This ensured that
> workers who are doing more I/O got throttled more. The idea is to
> allow any worker to sleep only if it has performed the I/O above a
> certain threshold and the overall balance is more than the cost_limit
> set by the system. Then we will allow the worker to sleep
> proportional to the work done by it and reduce the
> VacuumSharedCostBalance by the amount which is consumed by the current
> worker. This scheme leads to the desired throttling by different
> workers based on the work done by the individual worker.
>
> We have tested this idea with various kinds of workloads like by
> varying shared buffer size, delays and number of workers. Then also,
> we have tried with a different number of indexes and workers. In all
> the tests, we found that the workers are throttled proportional to the
> I/O being done by a particular worker.

Thank you for summarizing!

I agreed to this approach.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-11-18 06:42:14 Re: Conflict handling for COPY FROM
Previous Message Pavel Stehule 2019-11-18 06:39:25 Re: dropdb --force