Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: generic plans and "initial" pruning
Date: 2024-08-15 12:57:40
Message-ID: CA+HiwqHL=aGU9Y4RYXQ5VCp4L5NVdiaQLLoXN3NCQQQMKo0ByQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 15, 2024 at 4:23 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Aug 12, 2024 at 8:54 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > 1. I went through many iterations of the changes to ExecInitNode() to
> > return a partially initialized PlanState tree when it detects that the
> > CachedPlan was invalidated after locking a child table and to
> > ExecEndNode() to account for the PlanState tree sometimes being
> > partially initialized, but it still seems fragile and bug-prone to me.
> > It might be because this approach is fundamentally hard to get right
> > or I haven't invested enough effort in becoming more confident in its
> > robustness.
>
> Can you give some examples of what's going wrong, or what you think
> might go wrong?
>
> I didn't think there was a huge problem here based on previous
> discussion, but I could very well be missing some important challenge.

TBH, it's more of a hunch that people who are not involved in this
development might find the new reality, whereby the execution is not
racefree until ExecutorRun(), hard to reason about.

That's perhaps true with the other approach too whereby one would need
to consult a separate data structure that records the result of
pruning done in plancache.c to be sure if a given node of the plan
tree coming from a CachedPlan is safe to execute or do something with.

> > 2. Refactoring needed due to the ExecutorStart() API change especially
> > that pertaining to portals does not seem airtight. I'm especially
> > worried about moving the ExecutorStart() call for the
> > PORTAL_MULTI_QUERY case from where it is currently to PortalStart().
> > That requires additional bookkeeping in PortalData and I am not
> > totally sure that the snapshot handling changes after that move are
> > entirely correct.
>
> Here again, it would help to see exactly what you had to do and what
> consequences you think it might have. But it sounds like you're
> talking about moving ExecutorStart() from PortalStart() to PortalRun()
> and I agree that sounds like it might have user-visible behavioral
> consequences that we don't want.

Let's specifically looks at this block of code in PortalRunMulti():

/*
* Must always have a snapshot for plannable queries. First time
* through, take a new snapshot; for subsequent queries in the
* same portal, just update the snapshot's copy of the command
* counter.
*/
if (!active_snapshot_set)
{
Snapshot snapshot = GetTransactionSnapshot();

/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}

/*
* We can't have the holdSnapshot also be the active one,
* because UpdateActiveSnapshotCommandId would complain. So
* force an extra snapshot copy. Plain PushActiveSnapshot
* would have copied the transaction snapshot anyway, so this
* only adds a copy step when setHoldSnapshot is true. (It's
* okay for the command ID of the active snapshot to diverge
* from what holdSnapshot has.)
*/
PushCopiedSnapshot(snapshot);

/*
* As for PORTAL_ONE_SELECT portals, it does not seem
* necessary to maintain portal->portalSnapshot here.
*/

active_snapshot_set = true;
}
else
UpdateActiveSnapshotCommandId();

Without the patch, the code immediately following this does a
CreateQueryDesc(), which "registers" the above copied snapshot,
followed by ExecutorStart() immediately followed by ExecutorRun(), for
each query in the list for the PORTAL_RUN_MULTI case.

With the patch, CreateQueryDesc() and ExecutorStart() are moved to
PortalStart() so that QueryDescs including the PlanState trees for all
queries are built before any is run. Why? So that if ExecutorStart()
fails for any query in the list, we can simply throw out the QueryDesc
and the PlanState trees of the previous queries (NOT run them) and ask
plancache for a new CachedPlan for the list of queries. We don't have
a way to ask plancache.c to replan only a given query in the list.

Because of that reshuffling, the above block also needed to be moved
to PortalStart() along with the CommandCounterIncrement() between
queries. That requires the following non-trivial changes:

* A copy of the snapshot needs to be created for each statement after
the 1st one to be able to perform UpdateActiveSnapshotCommandId() on
it.

* In PortalRunMulti(), PushActiveSnapshot() must now be done for every
query because the executor expects the copy in the given query's
QueryDesc to match the ActiveSnapshot.

* There's no longer CCI() between queries in PortalRunMulti() because
the snapshots in each query's QueryDesc must have been adjusted to
reflect the correct command counter. I've checked but can't really be
sure if the value in the snapshot is all anyone ever uses if they want
to know the current value of the command counter.

There is likely to be performance regression for the multi-query cases
due to this handling of snapshots and maybe even correctness issues.

> > 3. The need to add *back* the fields to store the RT indexes of
> > relations that are not looked at by ExecInitNode() traversal such as
> > root partitioned tables and non-leaf partitions.
>
> I don't remember exactly why we removed those or what the benefit was,
> so I'm not sure how big of a problem it is if we have to put them
> back.

We removed those in commit 52ed730d511b after commit f2343653f5b2
removed redundant execution-time locking of non-leaf relations. So we
removed them because we realized that execution time locking is
unnecessary given that AcquireExecutorLocks() exists and now we want
to add them back because we'd like to get rid of
AcquireExecutorLocks(). :-)

I'm attaching a rebased version of the patch that implements the
current design because the cfbot has been broken for a while and also
in case you or anyone else wants to take another look. I've combined
2 patches into one -- one that dealt with the executor side changes to
account for locking and another that dealt with caller side changes to
handle executor returning when the CachedPlan becomes invalid.

--
Thanks, Amit Langote

Attachment Content-Type Size
v50-0002-Add-field-to-store-parent-relids-to-Append-Merge.patch application/octet-stream 24.5 KB
v50-0001-Assorted-tightening-in-various-ExecEnd-routines.patch application/octet-stream 31.3 KB
v50-0003-Assert-that-relations-needing-their-permissions-.patch application/octet-stream 2.2 KB
v50-0005-Don-t-lock-child-tables-in-GetCachedPlan.patch application/octet-stream 24.0 KB
v50-0004-Preparations-to-allow-executor-to-take-locks-in-.patch application/octet-stream 98.2 KB
v50-0006-Track-opened-range-table-relations-in-a-List-in-.patch application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-15 13:22:00 Re: Improve error message for ICU libraries if pkg-config is absent
Previous Message Jakub Wartak 2024-08-15 12:02:04 Re: Enable data checksums by default