From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Adding OLD/NEW support to RETURNING |
Date: | 2024-03-25 07:54:44 |
Message-ID: | CAEZATCWDGAd0mQgtXjgnEJ_Hf0v3hoMpQty7=nn9CFG3fc1HRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 18 Mar 2024 at 10:48, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).
>
I have been doing more testing of this and I realised that there was a
problem -- the previous patch worked fine when updating a regular
table, so that old/new.colname is just a Var, but when updating an
auto-updatable view, "colname" could end up being replaced by an
arbitrary expression. In the cases I had tested before, that appeared
to work OK, but actually it wasn't right in all cases where the result
should have been NULL, due to the old/new row being absent (e.g., the
old row in an INSERT).
After thinking about that for a while, the best solution seemed to be
to add a new executable node, which I've called ReturningExpr. This
evaluates the old/new expression if the old/new row exists, but skips
it and returns NULL if the old/new row doesn't exist. The simplest
example is a query like this, which now returns what I would expect:
DROP TABLE IF EXISTS tt CASCADE;
CREATE TABLE tt (a int PRIMARY KEY, b text);
INSERT INTO tt VALUES (1, 'R1');
CREATE VIEW tv AS SELECT a, b, 'Const' c FROM tt;
INSERT INTO tv VALUES (1, 'Row 1'), (2, 'Row 2')
ON CONFLICT (a) DO UPDATE SET b = excluded.b
RETURNING old.*, new.*;
a | b | c | a | b | c
---+----+-------+---+-------+-------
1 | R1 | Const | 1 | Row 1 | Const
| | | 2 | Row 2 | Const
(2 rows)
(Previously that was returning old.c = 'Const' in both rows, because
the Const node has no old/new qualification.)
In EXPLAIN, I opted to display this as "old/new.(expression)", to make
it clear that the expression is being evaluated in the context of the
old/new row, even if it doesn't directly refer to old/new values from
the table. So, for example, the plan for the above query looks like
this:
QUERY PLAN
--------------------------------------------------------------------------------
Insert on public.tt
Output: old.a, old.b, old.('Const'::text), new.a, new.b, new.('Const'::text)
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: tt_pkey
-> Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
(It can't output "old.c" or "new.c" because all knowledge of the view
column "c" is gone by the time it has been through the rewriter, and
in any case, the details of the expression being evaluated are likely
to be useful in general.)
Things get more complicated when subqueries are involved. For example,
given this view definition:
CREATE VIEW tv AS SELECT a, b, (SELECT concat('b=',b)) c FROM tt;
the INSERT above produces this:
a | b | c | a | b | c
---+----+------+---+-------+---------
1 | R1 | b=R1 | 1 | Row 1 | b=Row 1
| | | 2 | Row 2 | b=Row 2
(2 rows)
which is as expected. This uses the following query plan:
QUERY PLAN
----------------------------------------------------------------------------
Insert on public.tt
Output: old.a, old.b, old.((SubPlan 1)), new.a, new.b, new.((SubPlan 2))
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: tt_pkey
-> Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
SubPlan 1
-> Result
Output: concat('b=', old.b)
SubPlan 2
-> Result
Output: concat('b=', new.b)
In this case "b" in the view subquery becomes "old.b" in SubPlan 1 and
"new.b" in SubPlan 2 (each with varlevelsup = 1, and therefore
evaluated as input params to the subplans). The concat() result would
normally always be non-NULL, but it (or rather the SubLink subquery
containing it) is wrapped in a ReturningExpr. As a result, SubPlan 1
is skipped in the second row, for which old does not exist, and ends
up only being executed once in that query, whereas SubPlan 2 is
executed twice.
Things get even more fiddly when the old/new expression itself appears
in a subquery. For example, given the following query:
INSERT INTO tv VALUES (1, 'Row 1'), (2, 'Row 2')
ON CONFLICT (a) DO UPDATE SET b = excluded.b
RETURNING old.a, old.b, (SELECT old.c), new.*;
the result is the same, but the query plan is now
QUERY PLAN
----------------------------------------------------------------------
Insert on public.tt
Output: old.a, old.b, (SubPlan 2), new.a, new.b, new.((SubPlan 3))
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: tt_pkey
-> Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
SubPlan 1
-> Result
Output: concat('b=', old.b)
SubPlan 2
-> Result
Output: (old.((SubPlan 1)))
SubPlan 3
-> Result
Output: concat('b=', new.b)
The ReturningExpr nodes belong to the query level containing the
RETURNING list (hence they have a "levelsup" field, like Var,
PlaceHolderVar, etc.). So in this example, one of the ReturningExpr
nodes is in SubPlan 2, with "levelsup" = 1, wrapping SubPlan 1, i.e.,
it only executes SubPlan 1 if the old row exists.
Although that all sounds quite complicated, all the individual pieces
are quite simple.
Attached is an updated patch in which I have also tidied up a few
other things, but I haven't read your latest review comments yet. I'll
respond to those separately.
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
support-returning-old-new-v6.patch | text/x-patch | 212.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-03-25 08:07:35 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Bertrand Drouvot | 2024-03-25 07:35:45 | Re: Introduce XID age and inactive timeout based replication slot invalidation |