Re: initial pruning in parallel append

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initial pruning in parallel append
Date: 2023-08-09 10:22:39
Message-ID: CA+HiwqEUR6XTaSfbmU50XgZXPy2GXgrJ-juFuLFTTTY=ZoBXrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 8, 2023 at 11:16 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 8, 2023 at 2:58 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > Or we could consider something like the patch I mentioned in my 1st
> > email. The idea there was to pass the pruning result via a separate
> > channel, not the DSM chunk linked into the PlanState tree. To wit, on
> > the leader side, ExecInitParallelPlan() puts the serialized
> > List-of-Bitmapset into the shm_toc with a dedicated PARALLEL_KEY,
> > alongside PlannedStmt, ParamListInfo, etc. The List-of-Bitmpaset is
> > initialized during the leader's ExecInitNode(). On the worker side,
> > ExecParallelGetQueryDesc() reads the List-of-Bitmapset string and puts
> > the resulting node into the QueryDesc, that ParallelQueryMain() then
> > uses to do ExecutorStart() which copies the pointer to
> > EState.es_part_prune_results. ExecInitAppend() consults
> > EState.es_part_prune_results and uses the Bitmapset from there, if
> > present, instead of performing initial pruning.
>
> This also seems reasonable.
>
> > I'm assuming it's not
> > too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether
> > it should be writing to EState.es_part_prune_results or reading from
> > it -- the former if in the leader and the latter in a worker.
>
> I don't think that's too ugly. I mean you have to have an if statement
> someplace.

Yes, that makes sense.

It's just that I thought maybe I haven't thought hard enough about
options before adding a new IsParallelWorker(), because I don't find
too many instances of IsParallelWorker() in the generic executor code.
I think that's because most parallel worker-specific logic lives in
execParallel.c or in Exec*Worker() functions outside that file, so the
generic code remains parallel query agnostic as much as possible.

> > If we
> > are to go with this approach we will need to un-revert ec386948948c,
> > which moved PartitionPruneInfo nodes out of Append/MergeAppend nodes
> > to a List in PlannedStmt (copied into EState.es_part_prune_infos),
> > such that es_part_prune_results mirrors es_part_prune_infos.
>
> The comment for the revert (which was
> 5472743d9e8583638a897b47558066167cc14583) points to
> https://www.postgresql.org/message-id/4191508.1674157166@sss.pgh.pa.us
> as the reason, but it's not very clear to me why that email led to
> this being reverted. In any event, I agree that if we go with your
> idea to pass this via a separate PARALLEL_KEY, unreverting that patch
> seems to make sense, because otherwise I think we don't have a fast
> way to find the nodes that contain the state that we care about.

OK, I've attached the unreverted patch that adds
EState.es_part_prune_infos as 0001.

0002 adds EState.es_part_prune_results. Parallel query leader stores
the bitmapset of initially valid subplans by performing initial
pruning steps contained in a given PartitionPruneInfo into that list
at the same index as the PartitionPruneInfo's index in
es_part_prune_infos. ExecInitParallelPlan() serializes
es_part_prune_results and stores it in the DSM. A worker initializes
es_part_prune_results in its own EState by reading the leader's value
from the DSM and for each PartitionPruneInfo in its own copy of
EState.es_part_prune_infos, gets the set of initially valid subplans
by referring to es_part_prune_results in lieu of performing initial
pruning again.

Should workers, as Tom says, instead do the pruning and cross-check
the result to give an error if it doesn't match the leader's? The
error message can't specifically point out which, though a user would
at least know that they have functions in their database with wrong
volatility markings.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pla.patch application/octet-stream 19.9 KB
v1-0002-Share-initial-pruning-result-between-parallel-que.patch application/octet-stream 10.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-08-09 10:28:07 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Amit Kapila 2023-08-09 09:21:28 Re: logical decoding issue with concurrent ALTER TYPE