Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-08 09:59:45
Message-ID: CAA4eK1K9UrUV_UZabuPdL1WckM_s-gATYoNYLG7yagzVArZacg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2021 at 12:21 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > As an alternative, have you considered allocation of the XID even in parallel
> > > mode? I imagine that the first parallel worker that needs the XID for
> > > insertions allocates it and shares it with the other workers as well as with
> > > the leader process.
> > >
> >
> > As a matter of this patch
> > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> > never need to allocate xids by workers because Insert is always
> > performed by leader backend.
>
> When writing this comment, I was actually thinking of
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
> than v11-0001, see below. On the other hand, if we allowed XID allocation in
> the parallel mode (as a separate patch), even the 0001 patch would get a bit
> simpler.
>
> > Even, if we want to do what you are suggesting it would be tricky because
> > currently, we don't have such an infrastructure where we can pass
> > information among workers.
>
> How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
> workers "meet" at the barrier when they want to insert the first tuple. Then
> one of them allocates the XID, makes it available to others (via shared
> memory) and all the workers can continue.
>

Even if want to do this I am not sure if we need barriers because
there is no intrinsic need for all workers to stop before allocating
XID. After allocation of XID, we just need some way for other workers
to use it, maybe something like all workers currently synchronizes for
getting the block number to process in parallel sequence scans. But
the question is it really worth because in many cases it would be
already allocated by the time parallel DML operation is started and we
do share it in the beginning? I think if we really want to allow
allocation of xid in parallel-mode then we should also think to allow
it for subtransactions xid not only for main transactions and that
will open up some other use cases. I feel it is better to tackle that
problem separately.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-01-08 10:03:39 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Previous Message Laurenz Albe 2021-01-08 09:53:35 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)