Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-12-10 08:20:22
Message-ID: CAJcOf-fKb3hnec7pe2s8XUkvW99Wyxgu9cNLCj2DCSHufW1AWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
>
> /*
> + * PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> +
> + if (IsModifySupportedInParallelMode(commandType))
> + {
> + /*
> + * Prepare for entering parallel mode by assigning a
> + * FullTransactionId, to be included in the transaction state that is
> + * serialized in the parallel DSM.
> + */
> + (void) GetCurrentTransactionId();
> + }
> +}
>
> Why do we need to serialize the transaction ID for 0001? I mean in
> 0001 we are just allowing the SELECT to be executed in parallel so why
> we would need the transaction Id for that. I agree that we would need
> this once we try to perform the Insert also from the worker in the
> remaining patches.
>

There's a very good reason. It's related to parallel-mode checks for
Insert and how the XID is lazily acquired if required.
When allowing SELECT to be executed in parallel, we're in
parallel-mode and the leader interleaves Inserts with retrieval of the
tuple data from the workers.
You will notice that heap_insert() calls GetTransactionId() as the
very first thing it does. If the FullTransactionId is not valid,
AssignTransactionId() is then called, which then executes this code:

/*
* Workers synchronize transaction state at the beginning of each parallel
* operation, so we can't account for new XIDs at this point.
*/
if (IsInParallelMode() || IsParallelWorker())
elog(ERROR, "cannot assign XIDs during a parallel operation");

So that code (currently) has no way of knowing that a XID is being
(lazily) assigned at the beginning, or somewhere in the middle of, a
parallel operation.
This is the reason why PrepareParallelMode() is calling
GetTransactionId() up-front, to ensure a FullTransactionId is assigned
up-front, prior to parallel-mode (so then there won't be an attempted
XID assignment).

If you remove the GetTransactionId() call from PrepareParallelMode()
and run "make installcheck-world" with "force_parallel_mode=regress"
in effect, many tests will fail with:
ERROR: cannot assign XIDs during a parallel operation

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-10 08:28:56 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message k.jamison@fujitsu.com 2020-12-10 08:09:55 RE: [Patch] Optimize dropping of relation buffers using dlist