Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-12-04 13:34:16
Message-ID: CA+HiwqH8N-SxEB6SysEBsYNgV_KJs66k9Z2SNmqVzbBP-60yWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

On Mon, Dec 2, 2024 at 3:36 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Hi,
>
> I took a look at this patch, mostly to familiarize myself with the
> pruning etc. I have a bunch of comments, but all of that is minor,
> perhaps even nitpicking - with prior feedback from David, Tom and
> Robert, I can't really compete with that.

Thanks for looking at this. These are helpful.

> FWIW the patch needs a rebase, there's a minor bitrot - but it was
> simply enough to fix for a review / testing.
>
>
> 0001
> ----
>
> 1) But if we don't expect this error to actually happen, do we really
> need to make it ereport()? Maybe it should be plain elog(). I mean, it's
> "can't happen" and thus doesn't need translations etc.
>
> if (!bms_equal(relids, pruneinfo->relids))
> ereport(ERROR,
> errcode(ERRCODE_INTERNAL_ERROR),
> errmsg_internal("mismatching PartitionPruneInfo found at
> part_prune_index %d",
> part_prune_index),
> errdetail_internal("plan node relids %s, pruneinfo
> relids %s",
> bmsToString(relids),
> bmsToString(pruneinfo->relids)));

I'm fine with elog() here even if it causes the message to be longer:

elog(ERROR, "mismatching PartitionPruneInfo found at part_prune_index
%d (plan node relids %s, pruneinfo relids %s)

> Perhaps it should even be an assert?

I am not sure about that. Having a message handy might be good if a
user ends up hitting this case for whatever reason, like trying to run
a corrupted plan.

> 2) unnecessary newline added to execPartition.h

Perhaps you meant "removed". Fixed.

> 3) this comment in EState doesn't seem very helpful
>
> List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */

Agreed, fixed to be like the comment for es_rteperminfos:

List *es_part_prune_infos; /* List of PartitionPruneInfo */

> 5) PlannerGlobal
>
> /* List of PartitionPruneInfo contained in the plan */
> List *partPruneInfos;
>
> Why does this say "contained in the plan" unlike the other fields? Is
> there some sort of difference? I'm not saying it's wrong.

Ok, maybe the following is a bit more helpful and like the comment for
other fields:

/* "flat" list of PartitionPruneInfos */
List *partPruneInfos;

> 0002
> ----
>
> 1) Isn't it weird/undesirable partkey_datum_from_expr() loses some of
> the asserts? Would the assert be incorrect in the new implementation, or
> are we removing it simply because we happen to not have one of the fields?

The former -- the asserts would be incorrect in the new implementation
-- because in the new implementation a standalone ExprContext is used
that is independent of the parent PlanState (when available) for both
types of runtime pruning.

The old asserts, particularly the second one, weren't asserting
something very useful anyway, IMO. What I mean is that the
ExprContext provided in the PartitionPruneContext to be the same as
the parent PlanState's ps_ExprContext isn't critical to the code that
follows. Nor whether the PlanState is available or not.

> 2) inconsistent spelling: run-time vs. runtime

I assume you meant in this comment:

* estate The EState for the query doing runtime pruning

Fixed by using run-time, which is a more commonly used term in the
source code than runtime.

> 3) PartitionPruneContext.is_valid - I think I'd rename the flag to
> "initialized" or something like that. The "is_valid" is a bit confusing,
> because it might seem the context can get invalidated later, but AFAICS
> that's not the case - we just initialize it lazily.

Agree that "initialized" is better, so renamed.

> 0003
> ----
>
> 1) In InitPlan I'd move
>
> estate->es_part_prune_infos = plannedstmt->partPruneInfos;
>
> before the comment, which is more about ExecDoInitialPruning.

Makes sense, done.

> 2) I'm not quite sure what "exec" partition pruning is?
>
> /*
> * ExecInitPartitionPruning
> * Initialize the data structures needed for runtime "exec" partition
> * pruning and return the result of initial pruning, if available.
>
> Is that the same thing as "runtime pruning"?

"Exec" pruning refers to pruning performed during execution, using
PARAM_EXEC parameters. In contrast, "init" pruning occurs during plan
initialization, using parameters whose values remain constant during
execution, such as PARAM_EXTERN parameters and stable functions.

Before this patch, the ExecInitPartitionPruning function, called
during ExecutorStart(), performed "init" pruning and set up state in
the PartitionPruneState for subsequent "exec" pruning during
ExecutorRun(). With this patch, "init" pruning is performed well
before this function is called, leaving its sole responsibility to
setting up the state for "exec" pruning. It may be worth renaming the
function to better reflect this new role, rather than updating only
the comment.

Actually, that is what I decided to do in the attached, along with
some other adjustments like moving ExecDoInitialPruning() to
execPartition.c from execMain.c, fixing up some obsolete comments,
etc.

> 0004
> ----
>
> 1) typo: paraller/parallel

Oops, fixed.

> 2) What about adding an assert to ExecFindMatchingSubPlans, to check
> valisubplan_rtis is not NULL? It's just mentioned in a comment, but
> better to explicitly enforce that?

Good idea, done.

>
> 2) It may not be quite clear why ExecInitUpdateProjection() switches to
> mt_updateColnosLists. Should that be explained in a comment, somewhere?

There is a comment in the ModifyTableState struct definition:

/*
* List of valid updateColnosLists. Contains only those belonging to
* unpruned relations from ModifyTable.updateColnosLists.
*/
List *mt_updateColnosLists;

It seems redundant to reiterate this in ExecInitUpdateProjection().

> 3) unnecessary newline in ExecLookupResultRelByOid

Removed.

> 0005
> ----
>
> 1) auto_explain.c - So what happens if the plan gets invalidated? The
> hook explain_ExecutorStart returns early, but then what? Does that break
> the user session somehow, or what?

It will get called again after ExecutorStartExt() loops back to do
ExecutorStart() with a new updated plan tree.

> 2) Isn't it a bit fragile if this requires every extension to update
> and add the ExecPlanStillValid() calls to various places?

The ExecPlanStillValid() call only needs to be added immediately after
the call to standard_ExecutorStart() in an extension's
ExecutorStart_hook() implementation.

> What if an
> extension doesn't do that? What weirdness will happen?

The QueryDesc.planstate won't contain a PlanState tree for starters
and other state information that InitPlan() populates in EState based
on the PlannedStmt.

> Maybe it'd be
> possible to at least check this in some other executor hook? Or at least
> we could ensure the check was done in assert-enabled builds? Or
> something to make extension authors aware of this?

I've added a note in the commit message, but if that's not enough, one
idea might be to change the return type of ExecutorStart_hook so that
the extensions that implement it are forced to be adjusted. Say, from
void to bool to indicate whether standard_ExecutorStart() succeeded
and thus created a "valid" plan. I had that in the previous versions
of the patch. Thoughts?

> Aside from going through the patches, I did a simple benchmark to see
> how this works in practice. I did a simple test, with pgbench -S and
> variable number of partitions/clients. I also varied the number of locks
> per transaction, because I was wondering if it may interact with the
> fast-path improvements. See the attached xeon.sh script and CSV with
> results from the 44/88-core machine.
>
> There's also two PDFs visualizing the results, to show the impact as a
> difference between "master" (no patches) vs. "pruning" build with v57
> applied. As usual, "green" is good (faster), read is "bad" (slower).
>
> For most combinations of parameters, there's no impact on throughput.
> Anything in 99-101% is just regular noise, possibly even more. I'm
> trying to reduce the noise a bit more, but this seems acceptable. I'd
> like to discuss three "cases" I see in the results:

Thanks for doing these benchmarks. I'll reply separately to discuss
the individual cases.

> costing / auto mode
> -------------------
>
> Anyway, this leads me to a related question - not quite a "bug" in the
> patch, but something to perhaps think about. And that's costing, and
> what "auto" should do.
>
> There are two PNG charts, showing throughput for runs with -M prepared
> and 1000 partitions. Each chart shows throughput for the three cache
> modes, and different client counts. There's a clear distinction between
> "master" and "patched" runs - the "generic" plans performed terribly, by
> orders of magnitude. With the patches it beats the "custom" plans.
>
> Which is great! But it also means that while "auto" used to do the right
> thing, with the patches that's not the case.
>
> AFAIK that's because we don't consider the runtime pruning when costing
> the plans, so the cost is calculated as if no pruning happened. And so
> it seems way more expensive than it should ... and it loses with the
> custom scans. Is that correct, or do I understand this wrong?

That's correct. The planner does not consider runtime pruning when
assigning costs to Append or MergeAppend paths in
create_{merge}append_path().

> Just to be clear, I'm not claiming the patch has to deal with this. I
> suppose it can be handled as a future improvement, and I'm not even sure
> there's a good way to consider this during costing. For example, can we
> estimate how many partitions will be pruned?

There have been discussions about this in the 2017 development thread
of run-time pruning [1] and likely at some later point in other
threads. One simple approach mentioned at [1] is to consider that
only 1 partition will be scanned for queries containing WHERE partkey
= $1, because only 1 partition can contain matching rows with that
condition.

I agree that this should be dealt with sooner than later so users get
generic plans even without having to use force_generic_plan.

I'll post the updated patches tomorrow.

--
Thanks, Amit Langote

[1] https://www.postgresql.org/message-id/CA%2BTgmoZv8sd9cKyYtHwmd_13%2BBAjkVKo%3DECe7G98tBK5Ejwatw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-12-04 13:44:18 Re: Serverside SNI support in libpq
Previous Message Amul Sul 2024-12-04 13:02:33 Re: NOT ENFORCED constraint feature