Re: [HACKERS] Block level parallel vacuum

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-02-05 03:14:38
Message-ID: CAJrrPGdALjr9veOoiM=s7sNhm0pYo8d1GjQgwK1qn53rCkYhfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
> >
> >
> >
> >
> > + * Before starting parallel index vacuum and parallel cleanup index we
> launch
> > + * parallel workers. All parallel workers will exit after processed all
> indexes
> >
> > parallel vacuum index and parallel cleanup index?
> >
> >
>
> ISTM we're using like "index vacuuming", "index cleanup" and "FSM
> vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
> "parallel index cleanup" would be better?
>

OK.

> > + /*
> > + * If there is already-updated result in the shared memory we
> > + * use it. Otherwise we pass NULL to index AMs and copy the
> > + * result to the shared memory segment.
> > + */
> > + if (lvshared->indstats[idx].updated)
> > + result = &(lvshared->indstats[idx].stats);
> >
> > I didn't really find a need of the flag to differentiate the stats
> pointer from
> > first run to second run? I don't see any problem in passing directing
> the stats
> > and the same stats are updated in the worker side and leader side.
> Anyway no two
> > processes will do the index vacuum at same time. Am I missing something?
> >
> > Even if this flag is to identify whether the stats are updated or not
> before
> > writing them, I don't see a need of it compared to normal vacuum.
> >
>
> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
> first time execution. For example, btvacuumcleanup skips cleanup if
> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
> amvacuumcleanup when the first time calling. And they store the result
> stats to the memory allocated int the local memory. Therefore in the
> parallel vacuum I think that both worker and leader need to move it to
> the shared memory and mark it as updated as different worker could
> vacuum different indexes at the next time.
>

OK, understood the point. But for btbulkdelete whenever the stats are NULL,
it allocates the memory. So I don't see a problem with it.

The only problem is with btvacuumcleanup, when there are no dead tuples
present in the table, the btbulkdelete is not called and directly the
btvacuumcleanup
is called at the end of vacuum, in that scenario, there is code flow
difference
based on the stats. so why can't we use the deadtuples number to
differentiate
instead of adding another flag? And also this scenario is not very often,
so avoiding
memcpy for normal operations would be better. It may be a small gain, just
thought of it.

>
> > + initStringInfo(&buf);
> > + appendStringInfo(&buf,
> > + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> > + "launched %d parallel vacuum workers %s (planned: %d",
> > + lvstate->pcxt->nworkers_launched),
> > + lvstate->pcxt->nworkers_launched,
> > + for_cleanup ? "for index cleanup" : "for index vacuum",
> > + lvstate->pcxt->nworkers);
> > + if (lvstate->options.nworkers > 0)
> > + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
> >
> > what is the difference between planned workers and requested workers,
> aren't both
> > are same?
>
> The request is the parallel degree that is specified explicitly by
> user whereas the planned is the actual number we planned based on the
> number of indexes the table has. For example, if we do like 'VACUUM
> (PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
> and the planned is 4. Also if max_parallel_maintenance_workers is 2
> the planned is 2.
>

OK.

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2019-02-05 03:21:09 Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Previous Message Amit Kapila 2019-02-05 03:04:15 Re: WIP: Avoid creation of the free space map for small tables