From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Block level parallel vacuum |
Date: | 2019-03-19 10:01:06 |
Message-ID: | CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=kRagFLjUiBg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ(at)mail(dot)gmail(dot)com>
> > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > almost the same. I suspect that the indexes are too-small and all
> > > the index pages were on memory and CPU is saturated. Maybe you
> > > had four cores and parallel workers more than the number had no
> > > effect. Other normal backends should have been able do almost
> > > nothing meanwhile. Usually the number of parallel workers is
> > > determined so that IO capacity is filled up but this feature
> > > intermittently saturates CPU capacity very under such a
> > > situation.
> > >
> >
> > I'm sorry I didn't make it clear enough. If the parallel degree is
> > higher than 'the number of indexes - 1' redundant workers are not
> > launched. So for indexes=4, 8, 16 the number of actually launched
> > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > shows almost the same execution time in the cases where nindexes <=
> > parallel_degree.
>
> In the 16 indexes case, the performance saturated at 4 workers
> which contradicts to your explanation.
Because the machine I used has 4 cores the performance doesn't get
improved even if more than 4 parallel workers are launched.
>
> > I'll share the performance test result of more larger tables and indexes.
> >
> > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > buffering few tuples) to workers and workers process it not by
> > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > avoid the sleep time of heap-scan while index bulkdelete.
> > >
> >
> > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > processes including the leader process do index vacuuming, no one
> > doesn't sleep during index vacuuming. The leader process does heap
> > scan and launches parallel workers before index vacuuming. Each
> > processes exclusively processes indexes one by one.
>
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.
Yeah, I've considered that something like pipe-lining approach that
one process continue to queue the dead tuples and other process
fetches and processes them during index vacuuming but the current
version patch employed the most simple approach as the first step.
Once we had the retail index deletion approach we might be able to use
it for parallel vacuum.
>
> > Such index deletion method could be an optimization but I'm not sure
> > that the calling tuple_delete many times would be faster than one
> > bulkdelete. If there are many dead tuples vacuum has to call
> > tuple_delete as much as dead tuples. In general one seqscan is faster
> > than tons of indexscan. There is the proposal for such one by one
> > index deletions[1] but it's not a replacement of bulkdelete.
>
> I'm not sure what you mean by 'replacement' but it depends on how
> large part of a table is removed at once. As mentioned in the
> thread. But unfortunately it doesn't seem easy to do..
>
> > > > Attached the updated version patches. The patches apply to the current
> > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > > Node since it's under the discussion. After the direction has been
> > > > decided, I'll update the patches.
> > >
> > > As for the to-be-or-not-to-be a node problem, I don't think it is
> > > needed but from the point of consistency, it seems reasonable and
> > > it is seen in other nodes that *Stmt Node holds option Node. But
> > > makeVacOpt and it's usage, and subsequent operations on the node
> > > look somewhat strange.. Why don't you just do
> > > "makeNode(VacuumOptions)"?
> >
> > Thank you for the comment but this part has gone away as the recent
> > commit changed the grammar production of vacuum command.
>
> Oops!
>
>
> > > >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > > >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> > >
> > > If I understand this correctly, nindexes is always > 1 there. At
> > > lesat asserted that > 0 there.
> > >
> > > >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> > >
> > > I don't think the name is good. (dt menant detach by the first look for me..)
> >
> > Fixed.
> >
> > >
> > > >+ if (lps->nworkers_requested > 0)
> > > >+ appendStringInfo(&buf,
> > > >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",
> > >
> > > "planned"?
> >
> > The 'planned' shows how many parallel workers we planned to launch.
> > The degree of parallelism is determined based on either user request
> > or the number of indexes that the table has.
> >
> > >
> > >
> > > >+ /* Get the next index to vacuum */
> > > >+ if (do_parallel)
> > > >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > > >+ else
> > > >+ idx = nprocessed++;
> > >
> > > It seems that both of the two cases can be handled using
> > > LVParallelState and most of the branches by lps or do_parallel
> > > can be removed.
> > >
> >
> > Sorry I couldn't get your comment. You meant to move nprocessed to
> > LVParallelState?
>
> Exactly. I meant letting lvshared points to private memory, but
> it might introduce confusion.
Hmm, I'm not sure it would be a good idea. It would introduce
confusion as you mentioned. And since 'nprocessed' have to be
pg_atomic_uint32 in parallel mode we will end up with having an
another branch.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2019-03-19 10:09:59 | Re: Proposal to suppress errors thrown by to_reg*() |
Previous Message | Kyotaro HORIGUCHI | 2019-03-19 10:00:20 | Re: [HACKERS] CLUSTER command progress monitor |