From: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel INSERT (INTO ... SELECT ...) |
Date: | 2020-09-28 03:14:27 |
Message-ID: | CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> I further checked on full txn id and command id. Yes, these are
> getting passed to workers via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
> GetCurrentFullTransactionId()
> GetCurrentCommandId(true)
> EnterParallelMode();
> InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>
Hi Bharath,
I pretty much agree with your above points.
I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]
This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).
Regards,
Greg Nancarrow
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
0003-ParallelInsertSelect.patch | application/octet-stream | 29.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Abhijit Menon-Sen | 2020-09-28 03:39:24 | [PATCH] SET search_path += octopus |
Previous Message | Kyotaro Horiguchi | 2020-09-28 02:54:16 | Re: New statistics for tuning WAL buffer size |