From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel INSERT (INTO ... SELECT ...) |
Date: | 2020-09-24 02:21:17 |
Message-ID: | 20200924022117.x22cdky5fy5cwljh@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote:
> Following on from Dilip Kumar's POC patch for allowing parallelism of
> the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
> patch for allowing parallelism of both the INSERT and SELECT parts,
> where it can be allowed.
Cool!
I think it'd be good if you outlined what your approach is to make this
safe.
> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.
I think it'd be good to do this part separately and first, independent
of whether the insert part can be parallelized.
> Obviously I've had to update the planner and executor and
> parallel-worker code to make this happen, hopefully not breaking too
> many things along the way.
Hm, it looks like you've removed a fair bit of checks, it's not clear to
me why that's safe in each instance.
> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
> INTO ... VALUES ..." would need additional Table AM functions for
> dividing up the INSERT work amongst the workers (currently only exists
> for scans).
Hm, not entirely following. What precisely are you thinking of here?
I doubt it's really worth adding parallelism support for INSERT
... VALUES, the cost of spawning workers will almost always higher than
the benefit.
> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
> TupleDesc toasttupDesc;
> Datum t_values[3];
> bool t_isnull[3];
> - CommandId mycid = GetCurrentCommandId(true);
> + CommandId mycid = GetCurrentCommandId(!IsParallelWorker());
> struct varlena *result;
> struct varatt_external toast_pointer;
> union
Hm? Why do we need this in the various places you have made this change?
> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 1585861..94c8507 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> * inserts in general except for the cases where inserts generate a new
> * CommandId (eg. inserts into a table having a foreign key column).
> */
> - if (IsParallelWorker())
> - ereport(ERROR,
> - (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> - errmsg("cannot insert tuples in a parallel worker")));
> -
I'm afraid that this weakens our checks more than I'd like. What if this
ends up being invoked from inside C code?
> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
> isdead = false;
> break;
> case HEAPTUPLE_INSERT_IN_PROGRESS:
> -
> /*
> * Since we hold exclusive lock on the relation, normally the
> * only way to see this is if it was inserted earlier in our
> * own transaction. However, it can happen in system
> * catalogs, since we tend to release write lock before commit
> - * there. Give a warning if neither case applies; but in any
> - * case we had better copy it.
> + * there. In any case we had better copy it.
> */
> - if (!is_system_catalog &&
> - !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
> - elog(WARNING, "concurrent insert in progress within table \"%s\"",
> - RelationGetRelationName(OldHeap));
> +
> /* treat as live */
> isdead = false;
> break;
> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation,
> * the only way to see this is if it was inserted earlier
> * in our own transaction. However, it can happen in
> * system catalogs, since we tend to release write lock
> - * before commit there. Give a warning if neither case
> - * applies.
> + * before commit there.
> */
> xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
> if (!TransactionIdIsCurrentTransactionId(xwait))
> {
> - if (!is_system_catalog)
> - elog(WARNING, "concurrent insert in progress within table \"%s\"",
> - RelationGetRelationName(heapRelation));
> -
> /*
> * If we are performing uniqueness checks, indexing
> * such a tuple could lead to a bogus uniqueness
Huh, I don't think this should be necessary?
> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> index a4944fa..9d3f100 100644
> --- a/src/backend/access/transam/varsup.c
> +++ b/src/backend/access/transam/varsup.c
> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
> TransactionId xid;
>
> /*
> - * Workers synchronize transaction state at the beginning of each parallel
> - * operation, so we can't account for new XIDs after that point.
> - */
> - if (IsInParallelMode())
> - elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> -
> - /*
> * During bootstrap initialization, we return the special bootstrap
> * transaction id.
> */
Same thing, this code cannot just be allowed to be reachable. What
prevents you from assigning two different xids from different workers
etc?
> @@ -577,13 +608,6 @@ AssignTransactionId(TransactionState s)
> Assert(s->state == TRANS_INPROGRESS);
>
> /*
> - * Workers synchronize transaction state at the beginning of each parallel
> - * operation, so we can't account for new XIDs at this point.
> - */
> - if (IsInParallelMode() || IsParallelWorker())
> - elog(ERROR, "cannot assign XIDs during a parallel operation");
> -
> - /*
> * Ensure parent(s) have XIDs, so that a child always has an XID later
> * than its parent. Mustn't recurse here, or we might get a stack
> * overflow if we're at the bottom of a huge stack of subtransactions none
Dito.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2020-09-24 02:27:07 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | vignesh C | 2020-09-24 02:06:48 | Re: Parallel Inserts in CREATE TABLE AS |