From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Block level parallel vacuum |
Date: | 2019-11-21 09:55:23 |
Message-ID: | CAFiTN-s4xx38s0AS-BrNaFDcvpzmYbSgc2YeDVtaQF2CvVkk6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 21 Nov 2019, 14:15 Masahiko Sawada, <masahiko(dot)sawada(at)2ndquadrant(dot)com>
wrote:
> On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > >
> > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > > > >
> > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > > > >
> > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <
> amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Based on these needs, we came up with a way to allow users
> to specify
> > > > > > > > this information for IndexAm's. Basically, Indexam will
> expose a
> > > > > > > > variable amparallelvacuumoptions which can have below options
> > > > > > > >
> > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither
> bulkdelete nor
> > > > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > >
> > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs
> who don't
> > > > > > > want to support parallel vacuum don't have to set anything.
> > > > > > >
> > > > > >
> > > > > > make sense.
> > > > > >
> > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be
> done in
> > > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom
> will set this
> > > > > > > > flag)
> > > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup
> can be
> > > > > > > > done in parallel if bulkdelete is not performed (Indexes
> nbtree, brin,
> > > > > > > > gin, gist,
> > > > > > > > spgist, bloom will set this flag)
> > > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can
> be done in
> > > > > > > > parallel even if bulkdelete is already performed (Indexes
> gin, brin,
> > > > > > > > and bloom will set this flag)
> > > > > > >
> > > > > > > I think gin and bloom don't need to set both but should set
> only
> > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > > > >
> > > > > > > And I'm going to disallow index AMs to set both
> > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and
> VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > > > by assertions, is that okay?
> > > > > > >
> > > > > >
> > > > > > Sounds reasonable to me.
> > > > > >
> > > > > > Are you planning to include the changes related to I/O throttling
> > > > > > based on the discussion in the nearby thread [1]? I think you
> can do
> > > > > > that if you agree with the conclusion in the last email[1],
> otherwise,
> > > > > > we can explore it separately.
> > > > >
> > > > > Yes I agreed. I'm going to include that changes in the next version
> > > > > patches. And I think we will be able to do more discussion based on
> > > > > the patch.
> > > > >
> > > >
> > > > I've attached the latest version patch set. The patch set includes
> all
> > > > discussed points regarding index AM options as well as shared cost
> > > > balance. Also I added some test cases used all types of index AM.
> > > >
> > > > During developments I had one concern about the number of parallel
> > > > workers to launch. In current design each index AMs can choose the
> > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > also means the number of parallel worker to launch might be different
> > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > current patch the leader will always launch the number of indexes
> that
> > > > support either one but it would not be efficient in some cases. For
> > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > 5 workers for each execution but some workers will do nothing at all.
> > > > To deal with this problem, I wonder if we can improve the parallel
> > > > query so that the leader process creates a parallel context with the
> > > > maximum number of indexes and can launch a part of workers instead of
> > > > all of them.
> > > >
> > > +
> > > + /* compute new balance by adding the local value */
> > > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > > + new_balance = shared_balance + VacuumCostBalance;
> > >
> > > + /* also compute the total local balance */
> > > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> > > +
> > > + if ((new_balance >= VacuumCostLimit) &&
> > > + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> > > + {
> > > + /* compute sleep time based on the local cost balance */
> > > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > + new_balance = shared_balance - VacuumCostBalanceLocal;
> > > + VacuumCostBalanceLocal = 0;
> > > + }
> > > +
> > > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> > > + &shared_balance,
> > > + new_balance))
> > > + {
> > > + /* Updated successfully, break */
> > > + break;
> > > + }
> > > While looking at the shared costing delay part, I have noticed that
> > > while checking the delay condition, we are considering local_balance
> > > which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> > > computing the new balance we only reduce shared balance by
> > > VacuumCostBalanceLocal, I think it should be reduced with
> > > local_balance? I see that later we are adding VacuumCostBalance to
> > > the VacuumCostBalanceLocal so we are not loosing accounting for this
> > > balance. But, I feel it is not right that we compare based on one
> > > value and operate based on other. I think we can immediately set
> > > VacuumCostBalanceLocal += VacuumCostBalance before checking the
> > > condition.
> > >
> >
> > +/*
> > + * index_parallelvacuum_estimate - estimate shared memory for parallel
> vacuum
> > + *
> > + * Currently, we don't pass any information to the AM-specific
> estimator,
> > + * so it can probably only return a constant. In the future, we might
> need
> > + * to pass more information.
> > + */
> > +Size
> > +index_parallelvacuum_estimate(Relation indexRelation)
> > +{
> > + Size nbytes;
> > +
> > + RELATION_CHECKS;
> > +
> > + /*
> > + * If amestimateparallelvacuum is not provided, assume only
> > + * IndexBulkDeleteResult is needed.
> > + */
> > + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
> > + {
> > + nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
> > + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
> > + }
> > + else
> > + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
> > +
> > + return nbytes;
> > +}
> >
> > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I
> > am a bit doubtful about this kind of arrangement, where the code in
> > the "if" is always unreachable with the current AMs. I am not sure
> > what is the best way to handle this, should we just drop the
> > amestimateparallelvacuum altogether?
>
> IIUC the motivation of amestimateparallelvacuum is for third party
> index AM. If it allocates memory more than IndexBulkDeleteResult like
> the current gist indexes (although we'll change it) it will break
> index statistics of other indexes or even can be cause of crash. I'm
> not sure there is such third party index AMs and it's true that all
> index AMs in postgres code will not use this callback as you
> mentioned, but I think we need to take care of it because such usage
> is still possible.
>
> > Because currently, we are just
> > providing a size estimate function without a copy function, even if
> > the in future some Am give an estimate about the size of the stats, we
> > can not directly memcpy the stat from the local memory to the shared
> > memory, we might then need a copy function also from the AM so that it
> > can flatten the stats and store in proper format?
>
> I might be missing something but why can't we copy the stats from the
> local memory to the DSM without the callback for copying stats? The
> lazy vacuum code will get the pointer of the stats that are allocated
> by index AM and the code can know the size of it. So I think we can
> just memcpy to DSM.
>
Oh sure. But, what I meant is that if AM may keep pointers in its stats
as GistBulkDeleteResult do so we might not be able to copy directly outside
the AM. So I thought that if we have a call back for the copy then the AM
can flatten the stats such that IndexBulkDeleteResult, followed by AM
specific stats. Yeah but someone may argue that we might force the AM to
return the stats in a form that it can be memcpy directly. So I think I am
fine with the way it is.
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-11-21 10:11:55 | Re: why doesn't optimizer can pull up where a > ( ... ) |
Previous Message | Amit Langote | 2019-11-21 09:42:10 | Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof? |