| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> | 
| Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Daniel Westermann <dwe(at)dbi-services(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec> | 
| Subject: | Re: support for MERGE | 
| Date: | 2022-01-28 22:19:33 | 
| Message-ID: | 20220128221933.qgu4n7tphdiryjt3@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2022-01-28 17:27:37 -0300, Alvaro Herrera wrote:
> MERGE, v10.  I am much more comfortable with this version; I have
> removed a bunch of temporary hacks and cleaned up the interactions with
> table AM and executor, which is something that had been bothering me for
> a while.  The complete set of changes can be seen in github,
> https://github.com/alvherre/postgres/commits/merge-15
Any chance you could split this into something more reviewable? The overall
diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats
pretty hard to really review. Incremental commits don't realy help with that.
> I am not aware of anything of significance in terms of remaining work
> for this project.
One thing from skimming: There's not enough documentation about the approach
imo - it's a complicated enough feature to deserve more than the one paragraph
in src/backend/executor/README.
I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an
executor node.
> The one thing I'm a bit bothered about is the fact that we expose a lot of
> executor functions previously static.  I am now wondering if it would be
> better to move the MERGE executor support functions into nodeModifyTable.c,
> which I think would mean we would not have to expose those function
> prototypes.
I'm worried about the complexity of nodeModifyTuple.c as is. Moving more code
in there does not seem like the right call to me - we should do the opposite
if anything.
A few inline comments below. No real review, just stuff noticed while skimming
to see where this is at.
Greetings,
Andres
[1] https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe@alap3.anarazel.de
[2] https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup@alap3.anarazel.de
> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 1a9c1ac290..280ac40e63 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c
This stuff seems like one candidate for splitting out.
> +	/*
> +	 * We maintain separate transition tables for UPDATE/INSERT/DELETE since
> +	 * MERGE can run all three actions in a single statement. Note that UPDATE
> +	 * needs both old and new transition tables whereas INSERT needs only new,
> +	 * and DELETE needs only old.
> +	 */
> +
> +	/* "old" transition table for UPDATE, if any */
> +	Tuplestorestate *old_upd_tuplestore;
> +	/* "new" transition table for UPDATE, if any */
> +	Tuplestorestate *new_upd_tuplestore;
> +	/* "old" transition table for DELETE, if any */
> +	Tuplestorestate *old_del_tuplestore;
> +	/* "new" transition table INSERT, if any */
> +	Tuplestorestate *new_ins_tuplestore;
> +
>  	TupleTableSlot *storeslot;	/* for converting to tuplestore's format */
>  };
Do we need to do something slightly clever about the memory limits for the
tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c
one memory hungry node (the worst of all maybe?).
> +
> +/*
> + * Perform MERGE.
> + */
> +TupleTableSlot *
> +ExecMerge(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
> +		  EState *estate, TupleTableSlot *slot)
> +{
> +	ExprContext *econtext = mtstate->ps.ps_ExprContext;
> +	ItemPointer tupleid;
> +	ItemPointerData tuple_ctid;
> +	bool		matched = false;
> +	Datum		datum;
> +	bool		isNull;
> +
> +	/*
> +	 * Reset per-tuple memory context to free any expression evaluation
> +	 * storage allocated in the previous cycle.
> +	 */
> +	ResetExprContext(econtext);
Hasn't this already happened in ExecModifyTable()?
> +	/*
> +	 * We run a JOIN between the target relation and the source relation to
> +	 * find a set of candidate source rows that have a matching row in the
> +	 * target table and a set of candidate source rows that do not have a
> +	 * matching row in the target table. If the join returns a tuple with the
> +	 * target relation's row-ID set, that implies that the join found a
> +	 * matching row for the given source tuple. This case triggers the WHEN
> +	 * MATCHED clause of the MERGE. Whereas a NULL in the target relation's
> +	 * row-ID column indicates a NOT MATCHED case.
> +	 */
> +	datum = ExecGetJunkAttribute(slot,
> +								 resultRelInfo->ri_RowIdAttNo,
> +								 &isNull);
Could this be put somewhere common with the equivalent call in
ExecModifyTable()?
> +	 * A concurrent update can:
> +	 *
> +	 * 1. modify the target tuple so that it no longer satisfies the
> +	 *    additional quals attached to the current WHEN MATCHED action
> +	 *
> +	 *    In this case, we are still dealing with a WHEN MATCHED case.
> +	 *    In this case, we recheck the list of WHEN MATCHED actions from
> +	 *    the start and choose the first one that satisfies the new target
> +	 *    tuple.
Duplicated "in this case"
> +				case TM_Invisible:
> +
> +					/*
> +					 * This state should never be reached since the underlying
> +					 * JOIN runs with a MVCC snapshot and EvalPlanQual runs
> +					 * with a dirty snapshot. So such a row should have never
> +					 * been returned for MERGE.
> +					 */
> +					elog(ERROR, "unexpected invisible tuple");
> +					break;
Seems like it was half duplicated at some point?
> +				case TM_Updated:
> +				case TM_Deleted:
> +
> +					/*
> +					 * The target tuple was concurrently updated/deleted by
> +					 * some other transaction.
> +					 *
> +					 * If the current tuple is the last tuple in the update
> +					 * chain, then we know that the tuple was concurrently
> +					 * deleted. Just return and let the caller try NOT MATCHED
> +					 * actions.
Is it really correct to behave that way when using repeatable read /
serializable?
> +				/*
> +				 * Project the tuple.  In case of a partitioned table, the
> +				 * projection was already built to use the root's descriptor,
> +				 * so we don't need to map the tuple here.
> +				 */
> +				actionInfo.actionState = action;
> +				insert_slot = ExecProject(action->mas_proj);
> +
> +				(void) ExecInsert(mtstate, rootRelInfo,
> +								  insert_slot, slot,
> +								  estate, &actionInfo,
> +								  mtstate->canSetTag);
> +				InstrCountFiltered1(&mtstate->ps, 1);
What happens if somebody concurrently inserts a conflicting row?
> --- a/src/include/executor/instrument.h
> +++ b/src/include/executor/instrument.h
> @@ -82,8 +82,11 @@ typedef struct Instrumentation
>  	double		ntuples;		/* total tuples produced */
>  	double		ntuples2;		/* secondary node-specific tuple counter */
>  	double		nloops;			/* # of run cycles for this node */
> -	double		nfiltered1;		/* # of tuples removed by scanqual or joinqual */
> -	double		nfiltered2;		/* # of tuples removed by "other" quals */
> +	double		nfiltered1;		/* # tuples removed by scanqual or joinqual OR
> +								 * # tuples inserted by MERGE */
> +	double		nfiltered2;		/* # tuples removed by "other" quals OR #
> +								 * tuples updated by MERGE */
> +	double		nfiltered3;		/* # tuples deleted by MERGE */
It was a bad idea to have nfiltered1/2. Arriving at nfiltered3, it seems we
really should rename them...
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhihong Yu | 2022-01-28 22:32:37 | Re: support for MERGE | 
| Previous Message | Tom Lane | 2022-01-28 22:16:32 | Re: Support tab completion for upper character inputs in psql |