Quick Links

Re: [HACKERS] Block level parallel vacuum

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Block level parallel vacuum
Date:	2019-12-31 06:13:52
Message-ID:	CAA4eK1+1o-BaPvJnK7BPThTryx3MRDS+mCf9eVVZT=SVJ8mwLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Dec 30, 2019 at 6:37 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote:
> >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>
> >
> >+1. It is already a separate patch and I think we can even discuss
> >more on it in a new thread once the main patch is committed or do you
> >think we should have a conclusion about it now itself? To me, this
> >option appears to be an extension to the main feature which can be
> >useful for some users and people might like to have a separate option,
> >so we can discuss it and get broader feedback after the main patch is
> >committed.
> >
>
> I don't think it's an extension of the main feature - it does not depend
> on it, it could be committed before or after the parallel vacuum (with
> some conflicts, but the feature itself is not affected).
>
> My point was that by moving it into a separate thread we're more likely
> to get feedback on it, e.g. from people who don't feel like reviewing
> the parallel vacuum feature and/or feel intimidated by t100+ messages in
> this thread.
>

I agree with this point.

> >> >>
> >> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
> >> >> we need a separate VACUUM option, instead of just using the existing
> >> >> max_parallel_maintenance_workers GUC?
> >> >>
> >
> >How will user specify parallel degree? The parallel degree is helpful
> >because in some cases users can decide how many workers should be
> >launched based on size and type of indexes.
> >
>
> By setting max_maintenance_parallel_workers.
>
> >> >> It's good enough for CREATE INDEX
> >> >> so why not here?
> >> >
> >
> >That is a different feature and I think here users can make a better
> >judgment based on the size of indexes. Moreover, users have an option
> >to control a parallel degree for 'Create Index' via Alter Table
> ><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
> >idea for parallel vacuum as the parallelism is more derived from size
> >and type of indexes. Now, we can think of a similar parameter at the
> >table/index level for parallel vacuum, but I don't see it equally
> >useful in this case.
> >
>
> I'm a bit skeptical about users being able to pick good parallel degree.
> If we (i.e. experienced developers/hackers with quite a bit of
> knowledge) can't come up with a reasonable heuristics, how likely is it
> that a regular user will come up with something better?
>

In this case, it is highly dependent on the number of indexes (as for
each index, we can spawn one worker). So, it is a bit easier for the
users to specify it. Now, we can internally also identify the same
and we do that in case the user doesn't specify it, however, that can
easily lead to more resource (CPU, I/O) usage than the user would like
to do for a particular vacuum. So, giving an option to the user
sounds quite reasonable to me. Anyway, in case user doesn't specify
the parallel_degree, we are going to select one internally.

> Not sure I understand why "parallel_workers" would not be suitable for
> parallel vacuum? I mean, even for CREATE INDEX it certainly matters the
> size/type of indexes, no?
>

The difference here is that in parallel vacuum each worker can scan a
separate index whereas parallel_workers is more of an option for
scanning heap in parallel. So, if the size of the heap is bigger,
then increasing that value helps whereas here if there are more number
of indexes on the table, increasing corresponding value for parallel
vacuum can help.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: [HACKERS] Block level parallel vacuum at 2019-12-30 13:07:14 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2019-12-31 08:04:53	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message	vignesh C	2019-12-31 06:05:38	Re: Reorderbuffer crash during recovery