Re: Adding OLD/NEW support to RETURNING

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding OLD/NEW support to RETURNING
Date: 2024-03-27 13:19:31
Message-ID: CAEZATCUbx4nuysjkzzPmRCcJKqyntHPe-W1hMobYkpqvxto7kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 27 Mar 2024 at 07:47, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> This isn't a complete review, but I spent a while looking at this, and
> it looks like it's in good shape.

Thanks for looking.

> I like the syntax, and I think the solution for renaming the alias
> ("RETURNING WITH (new as n, old as o)") is a good one.

Thanks, that's good to know. Settling on a good syntax can be
difficult, so it's good to know that people are generally supportive
of this.

> The implementation touches quite a few areas. How did you identify all
> of the potential problem areas?

Hmm, well that's one of the hardest parts, and it's really difficult
to be sure that I have.

Initially, when I was just adding a new field to Var, I just tried to
look at all the existing code that made Vars, or copied other
non-default fields like varnullingrels around. I still managed to miss
the necessary change in assign_param_for_var() on my first attempt,
but fortunately that was an easy fix.

More worrying was the fact that I managed to completely overlook the
fact that I needed to worry about non-updatable columns in
auto-updatable views until v6, which added the ReturningExpr node.
Once I realised that I needed that, and that it needed to be tied to a
particular query level, and so needed a "levelsup" field, I just
looked at GroupingFunc to identify the places in code that needed to
be updated to do the right thing for a query-level-aware node.

What I'm most worried about now is that there are other areas of
functionality like that, that I'm overlooking, and which will interact
with this feature in non-trivial ways.

> It seems the primary sources of
> complexity came from rules, partitioning, and updatable views, is that
> right?

Foreign tables looked like it would be tricky at first, but then
turned out to be trivial, after disallowing direct-modify when
returning old/new.

Rules are a whole area that I wish I didn't have to worry about (I
wish we had deprecated them a long time ago). In practice though, I
haven't done much beyond what seemed like the most obvious (and
simplest) thing.

Nonetheless, there are some interesting interactions that probably
need more careful examination. For example, the fact that the
RETURNING clause in a RULE already has its own "special table names"
OLD and NEW, which are actually references to different RTEs, unlike
the OLD and NEW that this patch introduces, which are references to
the result relation. This leads to a number of different cases:

Case 1
======

In the simplest case, the rule can simply contain "RETURNING *". This
leads to what I think is the most obvious and intuitive behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
DO INSTEAD UPDATE t1 SET val1 = NEW.val2
RETURNING *;

UPDATE t2 SET val2 = 'New value 2'
RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

old_val2 | new_val2 | t2_val2 | val2
-------------+-------------+-------------+-------------
Old value 1 | New value 2 | New value 2 | New value 2
(1 row)

So someone using the table with the rule can access old and new values
in the obvious way, and they will get new values by default for an
UPDATE.

The query plan for this is pretty-much what you'd expect:

QUERY PLAN
-------------------------------------------------------
Update on public.t1
Output: old.val1, new.val1, t1.val1, t1.val1
-> Nested Loop
Output: 'New value 2'::text, t1.ctid, t2.ctid
-> Seq Scan on public.t1
Output: t1.ctid
-> Materialize
Output: t2.ctid
-> Seq Scan on public.t2
Output: t2.ctid

Case 2
======

If the rule contains "RETURNING OLD.*", it means that the RETURNING
list of the rewritten query contains Vars that no longer refer to the
result relation, but instead refer to the old data in t2. This leads
the the following behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
DO INSTEAD UPDATE t1 SET val1 = NEW.val2
RETURNING OLD.*;

UPDATE t2 SET val2 = 'New value 2'
RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

old_val2 | new_val2 | t2_val2 | val2
-------------+-------------+-------------+-------------
Old value 2 | Old value 2 | Old value 2 | Old value 2
(1 row)

The reason this happens is that the Vars in the returning list don't
refer to the result relation, and so setting varreturningtype on them
has no effect, and is simply ignored. This can be seen by looking at
the query plan:

QUERY PLAN
----------------------------------------------------------------
Update on public.t1
Output: old.(t2.val2), new.(t2.val2), t2.val2, t2.val2
-> Nested Loop
Output: 'New value 2'::text, t1.ctid, t2.ctid, t2.val2
-> Seq Scan on public.t1
Output: t1.ctid
-> Materialize
Output: t2.ctid, t2.val2
-> Seq Scan on public.t2
Output: t2.ctid, t2.val2

So all the final output values come from t2, not the result relation t1.

Case 3
======

Similarly, if the rule contains "RETURNING NEW.*", the effect is
similar, because again, the Vars in the RETURNING list don't refer to
the result relation in the rewritten query:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
DO INSTEAD UPDATE t1 SET val1 = NEW.val2
RETURNING NEW.*;

UPDATE t2 SET val2 = 'New value 2'
RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

old_val2 | new_val2 | t2_val2 | val2
-------------+-------------+-------------+-------------
New value 2 | New value 2 | New value 2 | New value 2
(1 row)

This time, the query plan shows that the result values are coming from
the new source values:

QUERY PLAN
----------------------------------------------------------------------------------------------------------
Update on public.t1
Output: old.('New value 2'::text), new.('New value 2'::text), 'New
value 2'::text, 'New value 2'::text
-> Nested Loop
Output: 'New value 2'::text, t1.ctid, t2.ctid
-> Seq Scan on public.t1
Output: t1.ctid
-> Materialize
Output: t2.ctid
-> Seq Scan on public.t2
Output: t2.ctid

Case 4
======

It's also possible to use the new returning old/new syntax in the
rule, by defining custom aliases. This has a subtly different meaning,
because it indicates that Vars in the rewritten query should refer to
the result relation, with varreturningtype set accordingly. So, for
example, returning old in the rule using this technique leads to the
following behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
DO INSTEAD UPDATE t1 SET val1 = NEW.val2
RETURNING WITH (OLD AS o) o.*;

UPDATE t2 SET val2 = 'New value 2'
RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

old_val2 | new_val2 | t2_val2 | val2
-------------+-------------+-------------+-------------
Old value 1 | New value 2 | Old value 1 | Old value 1
(1 row)

The query plan for this indicates that all returned values now come
from the result relation, but the default is to return old values
rather than new values, and it now allows that default to be
overridden:

QUERY PLAN
-------------------------------------------------------
Update on public.t1
Output: old.val1, new.val1, old.val1, old.val1
-> Nested Loop
Output: 'New value 2'::text, t1.ctid, t2.ctid
-> Seq Scan on public.t1
Output: t1.ctid
-> Materialize
Output: t2.ctid
-> Seq Scan on public.t2
Output: t2.ctid

Case 5
======

Similarly, the rule can use the new syntax to return new values:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
DO INSTEAD UPDATE t1 SET val1 = NEW.val2
RETURNING WITH (NEW AS n) n.*;

UPDATE t2 SET val2 = 'New value 2'
RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

old_val2 | new_val2 | t2_val2 | val2
-------------+-------------+-------------+-------------
Old value 1 | New value 2 | New value 2 | New value 2
(1 row)

which is the same result as case 1, but with a slightly different query plan:

QUERY PLAN
-------------------------------------------------------
Update on public.t1
Output: old.val1, new.val1, new.val1, new.val1
-> Nested Loop
Output: 'New value 2'::text, t1.ctid, t2.ctid
-> Seq Scan on public.t1
Output: t1.ctid
-> Materialize
Output: t2.ctid
-> Seq Scan on public.t2
Output: t2.ctid

This explicitly sets the defaults for "t2.val2" and "val2"
unqualified, whereas in case 1 they were the implicit defaults for an
UPDATE command.

I think that all that is probably reasonable, but it definitely needs
documenting, which I haven't attempted yet.

Overall, I'm pretty hesitant to try to commit this to v17. Aside from
the fact that there's a lot of new code that hasn't had much in the
way of review or discussion, I also feel that I probably haven't fully
considered all areas where additional complexity might arise. It
doesn't seem like that long ago that this was just a prototype, and
it's certainly not that long ago that I had to add a substantial
amount of new code to deal with the auto-updatable view case that I
had completely overlooked.

So on reflection, rather than trying to rush to get this into v17, I
think it would be better to leave it to v18.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maiquel Grassi 2024-03-27 13:23:32 RE: Psql meta-command conninfo+
Previous Message Pavel Borisov 2024-03-27 12:54:51 Re: Table AM Interface Enhancements