From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | James Coleman <jtc331(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Parallelize correlated subqueries that execute within each worker |
Date: | 2021-11-15 15:01:37 |
Message-ID: | CA+TgmoaAPfPcQx0uYLusU4A+Vm5Cr2F8irxo=xf8BQ-97YyZ7A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 3, 2021 at 1:34 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> As I understand the current code, parallel plans are largely chosen
> based not on where it's safe to insert a Gather node but rather by
> determining if a given path is parallel safe. Through that lens params
> are a bit of an odd man out -- they aren't inherently unsafe in the
> way a parallel-unsafe function is, but they can only be used in
> parallel plans under certain conditions (whether because of project
> policy, performance, or missing infrastructure).
Right.
> Introducing consider_parallel_rechecking_params and
> parallel_safe_ignoring_params allows us to keep more context on params
> and make a more nuanced decision at the proper level of the plan. This
> is what I mean by "rechecked in the using context", though I realize
> now that both "recheck" and "context" are overloaded terms in the
> project, so don't describe the concept particularly clearly. When a
> path relies on params we can only make a final determination about its
> parallel safety if we know whether or not the current parallel node
> can provide the param's value. We don't necessarily know that
> information until we attempt to generate a full parallel node in the
> plan (I think what you're describing as "inserting a Gather node")
> since the param may come from another node in the plan. These new
> values allow us to do that by tracking tentatively parallel-safe
> subplans (given proper Gather node placement) and delaying the
> parallel-safety determination until the point at which a param is
> available (or not).
So I think I agree with you here. But I don't like all of this
"ignoring_params" stuff and I don't see why it's necessary. Say we
don't have both parallel_safe and parallel_safe_ignoring_params. Say
we just have parallel_safe. If the plan will be parallel safe if the
params are available, we label it parallel safe. If the plan will not
be parallel safe even if the params are available, we say it's not
parallel safe. Then, when we get to generate_gather_paths(), we don't
generate any paths if there are required parameters that are not
available. What's wrong with that approach?
Maybe it's clearer to say this: I feel like one extra Boolean is
either too much or too little. I think maybe it's not even needed. But
if it is needed, then why just a bool instead of, say, a Bitmapset of
params that are needed, or something?
I'm sort of speaking from intuition here rather than sure knowledge. I
might be totally wrong.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2021-11-15 15:42:49 | Re: Printing backtrace of postgres processes |
Previous Message | vignesh C | 2021-11-15 14:43:14 | Re: Skipping logical replication transactions on subscriber side |