Adding OLD/NEW support to RETURNING

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Adding OLD/NEW support to RETURNING
Date: 2023-12-04 12:14:42
Message-ID: CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been playing around with the idea of adding support for OLD/NEW
to RETURNING, partly motivated by the discussion on the MERGE
RETURNING thread [1], but also because I think it would be a very
useful addition for other commands (UPDATE in particular).

This was discussed a long time ago [2], but that previous discussion
didn't lead to a workable patch, and so I have taken a different
approach here.

My first thought was that this would only really make sense for UPDATE
and MERGE, since OLD/NEW are pretty pointless for INSERT/DELETE
respectively. However...

1. For an INSERT with an ON CONFLICT ... DO UPDATE clause, returning
OLD might be very useful, since it provides a way to see which rows
conflicted, and return the old conflicting values.

2. If a DELETE is turned into an UPDATE by a rule (e.g., to mark rows
as deleted, rather than actually deleting them), then returning NEW
can also be useful. (I admit, this is a somewhat obscure use case, but
it's still possible.)

3. In a MERGE, we need to be able to handle all 3 command types anyway.

4. It really isn't any extra effort to support INSERT and DELETE.

So in the attached very rough patch (no docs, minimal testing) I have
just allowed OLD/NEW in RETURNING for all command types (except, I
haven't done MERGE here - I think that's best kept as a separate
patch). If there is no OLD/NEW row in a particular context, it just
returns NULLs. The regression tests contain examples of 1 & 2 above.

Based on Robert Haas' suggestion in [2], the patch works by adding a
new "varreturningtype" field to Var nodes. This field is set during
parse analysis of the returning clause, which adds new namespace
aliases for OLD and NEW, if tables with those names/aliases are not
already present. So the resulting Var nodes have the same
varno/varattno as they would normally have had, but a different
varreturningtype.

For the most part, the rewriter and parser are then untouched, except
for a couple of places necessary to ensure that the new field makes it
through correctly. In particular, none of this affects the shape of
the final plan produced. All of the work to support the new Var
returning type is done in the executor.

This turns out to be relatively straightforward, except for
cross-partition updates, which was a little trickier since the tuple
format of the old row isn't necessarily compatible with the new row,
which is in a different partition table and so might have a different
column order.

One thing that I've explicitly disallowed is returning OLD/NEW for
updates to foreign tables. It's possible that could be added in a
later patch, but I have no plans to support that right now.

One difficult question is what names to use for the new aliases. I
think OLD and NEW are the most obvious and natural choices. However,
there is a problem - if they are used in a trigger function, they will
conflict. In PL/pgSQL, this leads to an error like the following:

ERROR: column reference "new.f1" is ambiguous
LINE 3: RETURNING new.f1, new.f4
^
DETAIL: It could refer to either a PL/pgSQL variable or a table column.

That's the same error that you'd get if a different alias name had
been chosen, and it happened to conflict with a user-defined PL/pgSQL
variable, except that in that case, the user could just change their
variable name to fix the problem, which is not possible with the
automatically-added OLD/NEW trigger variables. As a way round that, I
added a way to optionally change the alias used in the RETURNING list,
using the following syntax:

RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
* | output_expression [ [ AS ] output_name ] [, ...]

for example:

RETURNING WITH (OLD AS o) o.id, o.val, ...

I'm not sure how good a solution that is, but the syntax doesn't look
too bad to me (somewhat reminiscent of a WITH-query), and it's only
necessary in cases where there is a name conflict.

The simpler solution would be to just pick different alias names to
start with. The previous thread seemed to settle on BEFORE/AFTER, but
I don't find those names particularly intuitive or appealing. Over on
[1], PREVIOUS/CURRENT was suggested, which I prefer, but they still
don't seem as natural as OLD/NEW.

So, as is often the case, naming things turns out to be the hardest
problem, which is why I quite like the idea of letting the user pick
their own name, if they need to. In most contexts, OLD and NEW will
work, so they won't need to.

Thoughts?

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg(at)mail(dot)gmail(dot)com
[2] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com

Attachment Content-Type Size
support-returning-old-new-v0.patch text/x-patch 99.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-12-04 12:41:20 Re: Emitting JSON to file using COPY TO
Previous Message Amit Kapila 2023-12-04 12:11:34 Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION