| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> | 
|---|---|
| To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> | 
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Parallel INSERT (INTO ... SELECT ...) | 
| Date: | 2020-09-26 05:30:04 | 
| Message-ID: | CALj2ACVLW2HJqUUOroitodAEyYg=picLCtGoB=vpkzw68g2mNg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> Again, there's a fundamental difference in the Parallel Insert case.
> Right at the top of ExecutePlan it calls EnterParallelMode().
> For ParallelCopy(), there is no such problem. EnterParallelMode() is
> only called just before ParallelCopyMain() is called. So it can easily
> acquire the xid before this, because then parallel mode is not set.
>
> As it turns out, I think I have solved the commandId issue (and almost
> the xid issue) by realising that both the xid and cid are ALREADY
> being included as part of the serialized transaction state in the
> Parallel DSM. So actually I don't believe that there is any need for
> separately passing them in the DSM, and having to use those
> AssignXXXXForWorker() functions in the worker code - not even in the
> Parallel Copy case (? - need to check).
>
Thanks Gred for the detailed points.
I further checked on full txn id and command id. Yes, these are
getting passed to workers  via InitializeParallelDSM() ->
SerializeTransactionState(). I tried to summarize what we need to do
in case of parallel inserts in general i.e. parallel COPY, parallel
inserts in INSERT INTO and parallel inserts in CTAS.
In the leader:
    GetCurrentFullTransactionId()
    GetCurrentCommandId(true)
    EnterParallelMode();
    InitializeParallelDSM() --> calls SerializeTransactionState()
(both full txn id and command id are serialized into parallel DSM)
In the workers:
ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
full txn id and command id are restored into workers'
CurrentTransactionState->fullTransactionId and currentCommandId)
If the parallel workers are meant for insertions, then we need to set
currentCommandIdUsed = true; Maybe we can lift the assert in
GetCurrentCommandId(), if we don't want to touch that function, then
we can have a new function GetCurrentCommandidInWorker() whose
functionality will be same as GetCurrentCommandId() without the
Assert(!IsParallelWorker());.
Am I missing something?
If the above points are true, we might have to update the parallel
copy patch set, test the use cases and post separately in the parallel
copy thread in coming days.
Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2020-09-26 05:32:32 | Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers | 
| Previous Message | Amit Kapila | 2020-09-26 05:29:44 | Re: Parallel INSERT (INTO ... SELECT ...) |