From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Another way to fix inherited UPDATE/DELETE |
Date: | 2019-02-19 21:48:55 |
Message-ID: | 357.1550612935@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While contemplating the wreckage of
https://commitfest.postgresql.org/22/1778/
I had the beginnings of an idea of another way to fix that problem.
The issue largely arises from the fact that for UPDATE, we expect
the plan tree to emit a tuple that's ready to be stored back into
the target rel ... well, almost, because it also has a CTID or some
other row-identity column, so we have to do some work on it anyway.
But the point is this means we potentially need a different
targetlist for each child table in an inherited UPDATE.
What if we dropped that idea, and instead defined the plan tree as
returning only the columns that are updated by SET, plus the row
identity? It would then be the ModifyTable node's job to fetch the
original tuple using the row identity (which it must do anyway) and
form the new tuple by combining the updated columns from the plan
output with the non-updated columns from the original tuple.
DELETE would be even simpler, since it only needs the row identity
and nothing else.
Having done that, we could toss inheritance_planner into the oblivion
it so richly deserves, and just treat all types of inheritance or
partitioning queries as expand-at-the-bottom, as SELECT has always
done it.
Arguably, this would be more efficient even for non-inheritance join
situations, as less data (typically) would need to propagate through the
join tree. I'm not sure exactly how it'd shake out for trivial updates;
we might be paying for two tuple deconstructions not one, though perhaps
there's a way to finesse that. (One easy way would be to stick to the
old approach when there is no inheritance going on.)
In the case of a standard inheritance or partition tree, this seems to
go through really easily, since all the children could share the same
returned CTID column (I guess you'd also need a TABLEOID column so you
could figure out which table to direct the update back into). It gets
a bit harder if the tree contains some foreign tables, because they might
have different concepts of row identity, but I'd think in most cases you
could still combine those into a small number of output columns.
I have no idea how this might play with the pluggable-storage work.
Obviously this'd be a major rewrite with no chance of making it into v12,
but it doesn't sound too big to get done during v13.
Thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2019-02-19 22:08:45 | Re: Some thoughts on NFS |
Previous Message | David Rowley | 2019-02-19 21:07:10 | Re: Delay locking partitions during INSERT and UPDATE |