Re: MERGE ... RETURNING

From: Gurjeet Singh <gurjeet(at)singh(dot)im>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: MERGE ... RETURNING
Date: 2023-07-11 20:43:13
Message-ID: CABwTF4V2H9enRQQu5qCxhSOF49Z8ODMOcU2P6e5uSWGWTE-N5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 7, 2023 at 3:48 PM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> On Thu, 6 Jul 2023 at 06:13, Gurjeet Singh <gurjeet(at)singh(dot)im> wrote:
> >
> > > One minor annoyance is that, due to the way that pg_merge_action() and
> > > pg_merge_when_clause() require access to the MergeActionState, they
> > > only work if they appear directly in the RETURNING list. They can't,
> > > for example, appear in a subquery in the RETURNING list, and I don't
> > > see an easy way round that limitation.
> >
> > I believe that's a serious limitation, and would be a blocker for the feature.
>
> Yes, that was bugging me for quite a while.
>
> Attached is a new version that removes that restriction, allowing the
> merge support functions to appear anywhere. This requires a bit of
> planner support, to deal with merge support functions in subqueries,
> in a similar way to how aggregates and GROUPING() expressions are
> handled. But a number of the changes from the previous patch are no
> longer necessary, so overall, this version of the patch is slightly
> smaller.

+1

> > I think the name of function pg_merge_when_clause() can be improved.
> > How about pg_merge_when_clause_ordinal().
> >
>
> That's a bit of a mouthful, but I don't have a better idea right now.
> I've left the names alone for now, in case something better occurs to
> anyone.

+1. How do we make sure we don't forget that it needs to be named
better. Perhaps a TODO item within the patch will help.

> > In the doc page of MERGE, you've moved the 'with_query' from the
> > bottom of Parameters section to the top of that section. Any reason
> > for this? Since the Parameters section is quite large, for a moment I
> > thought the patch was _adding_ the description of 'with_query'.
> >
>
> Ah yes, I was just making the order consistent with the
> INSERT/UPDATE/DELETE pages. That could probably be committed
> separately.

I don't think that's necessary, if it improves consistency with related docs.

> > + /*
> > + * Merge support functions should only be called directly from a MERGE
> > + * command, and need access to the parent ModifyTableState. The parser
> > + * should have checked that such functions only appear in the RETURNING
> > + * list of a MERGE, so this should never fail.
> > + */
> > + if (IsMergeSupportFunction(funcid))
> > + {
> > + if (!state->parent ||
> > + !IsA(state->parent, ModifyTableState) ||
> > + ((ModifyTableState *) state->parent)->operation != CMD_MERGE)
> > + elog(ERROR, "merge support function called in non-merge context");
> >
> > As the comment says, this is an unexpected condition, and should have
> > been caught and reported by the parser. So I think it'd be better to
> > use an Assert() here. Moreover, there's ERROR being raised by the
> > pg_merge_action() and pg_merge_when_clause() functions, when they
> > detect the function context is not appropriate.
> >
> > I found it very innovative to allow these functions to be called only
> > in a certain part of certain SQL command. I don't think there's a
> > precedence for this in Postgres; I'd be glad to learn if there are
> > other functions that Postgres exposes this way.
> >
> > - EXPR_KIND_RETURNING, /* RETURNING */
> > + EXPR_KIND_RETURNING, /* RETURNING in INSERT/UPDATE/DELETE */
> > + EXPR_KIND_MERGE_RETURNING, /* RETURNING in MERGE */
> >
> > Having to invent a whole new ParseExprKind enum, feels like a bit of
> > an overkill. My only objection is that this is exactly like
> > EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt
> > with exactly in as many places. But I don't have any alternative
> > proposals.
> >
>
> That's all gone now from the new patch, since there is no longer any
> restriction on where these functions can appear.

I believe this elog can be safely turned into an Assert.

+ switch (mergeActionCmdType)
{
- CmdType commandType = relaction->mas_action->commandType;
....
+ case CMD_INSERT:
....
+ default:
+ elog(ERROR, "unrecognized commandType: %d", (int)
mergeActionCmdType);

The same treatment can be applied to the elog(ERROR) in pg_merge_action().

> > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int,
> > + OUT action text, OUT tid int,
> > OUT new_balance int)
> > +LANGUAGE sql AS
> > +$$
> > + MERGE INTO sq_target t
> > + USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta)
> > + ON tid = v.sid
> > + WHEN MATCHED AND tid > 2 THEN
> >
> > Again, for consistency, the comparison operator should be >=. There
> > are a few more occurrences of this comparison in the rest of the file,
> > that need the same treatment.
> >
>
> I changed the new tests to use ">= 2" (and the COPY test now returns 3
> rows, with an action of each type, which is easier to read), but I
> don't think it's really necessary to change all the existing tests
> from "> 2". There's nothing wrong with the "= 2" case having no
> action, as long as the tests give decent coverage.

I was just trying to drive these tests towards a consistent pattern.
As a reader, if I see these differences, the first and the
conservative thought I have is that these differences must be there
for a reason, and then I have to work to find out what those reasons
might be. And that's a lot of wasted effort, just in case someone
intends to change something in these tests.

I performed this round of review by comparing the diff between the v7
and v8 patches (after applying to commit 4f4d73466d)

-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ModifyTableContext *context,
+ ResultRelInfo *resultRelInfo,
...
+ TupleTableSlot *rslot;
...
+ if (context->relaction)
+ {
...
+ PG_TRY();
+ {
+ rslot = ExecProject(projectReturning);
+ }
+ PG_FINALLY();
+ {
+ mergeActionCmdType = saved_mergeActionCmdType;
+ mergeActionIdx = saved_mergeActionIdx;
+ }
+ PG_END_TRY();
+ }
+ else
+ rslot = ExecProject(projectReturning);
+
+ return rslot;

In the above hunk, if there's an exception/ERROR, I believe we should
PG_RE_THROW(). If there's a reason to continue, we should at least set
rslot = NULL, otherwise we may be returning an uninitialized value to
the caller.

{ oid => '9499', descr => 'command type of current MERGE action',
- proname => 'pg_merge_action', provolatile => 'v',
+ proname => 'pg_merge_action', provolatile => 'v', proparallel => 'r',
....
{ oid => '9500', descr => 'index of current MERGE WHEN clause',
- proname => 'pg_merge_when_clause', provolatile => 'v',
+ proname => 'pg_merge_when_clause', provolatile => 'v', proparallel => 'r',
....

I see that you've now set proparallel of these functions to 'r'. I'd
just like to understand how you got to that conclusion.

--- error when using MERGE support functions outside MERGE
-SELECT pg_merge_action() FROM sq_target;

I believe it would be worthwhile to keep a record of the expected
outputs of these invocations in the tests, just in case they change
over time.

Best regards,
Gurjeet
http://Gurje.et

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-07-11 20:45:56 Re: unrecognized node type while displaying a Path due to dangling pointer
Previous Message Gurjeet Singh 2023-07-11 19:01:10 Re: Document that server will start even if it's unable to open some TCP/IP ports