From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel INSERT (INTO ... SELECT ...) |
Date: | 2020-10-12 03:31:42 |
Message-ID: | CAA4eK1LaApQHzbtwdPE8RYTdKBOXbmUO1qd-G9zE0QC8iRLAVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > > you're right, neither of those additional "commandType == CMD_SELECT"
> > > checks are needed, so I'll remove them.
> > >
> >various
> > Okay, that makes sense.
> >
>
> For the minimal patch (just allowing INSERT with parallel SELECT),
> there are issues with parallel-mode and various parallel-mode-related
> checks in the code.
> Initially, I thought it was only a couple of XID-related checks (which
> could perhaps just be tweaked to check for IsParallelWorker() instead,
> as you suggested), but I now realise that there are a lot more cases.
> This stems from the fact that just having a parallel SELECT (as part
> of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
> plan. I'm not sure why parallel-mode is set globally like this, for
> the whole plan. Couldn't it just be set for the scope of
> Gather/GatherMerge? Otherwise, errors from these checks seem to be
> misleading when outside the scope of Gather/GatherMerge, as
> technically they are not occurring within the scope of parallel-leader
> and parallel-worker(s). The global parallel-mode wouldn't have been an
> issue before, because up to now INSERT has never had underlying
> parallel operations.
>
That is right but there is another operation which works like that.
For ex. a statement like "create table test_new As select * from
test_parallel where c1 < 1000;" will use parallel select but the write
operation will be performed in a leader. I agree that the code flow of
Insert is different so we will have a different set of challenges in
that case but to make it work there shouldn't be any fundamental
problem.
> For example, when running the tests under
> "force_parallel_mode=regress", the test failures show that there are a
> lot more cases affected:
>
> "cannot assign TransactionIds during a parallel operation"
> "cannot assign XIDs during a parallel operation"
> "cannot start commands during a parallel operation"
> "cannot modify commandid in active snapshot during a parallel operation"
> "cannot execute nextval() during a parallel operation"
> "cannot execute INSERT during a parallel operation"
> "cannot execute ANALYZE during a parallel operation
> "cannot update tuples during a parallel operation"
>
> (and there are more not currently detected by the tests, found by
> searching the code).
>
Did you get these after applying your patch? If so, can you share the
version which you are using, or if you have already posted the same
then point me to the same?
> As an example, with the minimal patch applied, if you had a trigger on
> INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
> ran an "INSERT INTO ... SELECT...", it would treat the trigger
> operations as being attempted in parallel-mode, and so an error would
> result.
>
Oh, I guess this happens because you need to execute Insert in
parallel-mode even though Insert is happening in the leader, right?
And probably we are not facing this with "Create Table As .." because
there is no trigger execution involved there.
> Let me know your thoughts on how to deal with these issues.
> Can you see a problem with only having parallel-mode set for scope of
> Gather/GatherMerge, or do you have some other idea?
>
I have not thought about this yet but I don't understand your
proposal. How will you set it only for the scope of Gather (Merge)?
The execution of the Gather node will be interleaved with the Insert
node, basically, you fetch a tuple from Gather, and then you need to
Insert it. Can you be a bit more specific on what you have in mind for
this?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2020-10-12 04:46:40 | Re: powerpc pg_atomic_compare_exchange_u32_impl: error: comparison of integer expressions of different signedness (Re: pgsql: For all ppc compilers, implement compare_exchange and) fetch_add |
Previous Message | Tom Lane | 2020-10-12 03:27:18 | Re: BUG #15858: could not stat file - over 4GB |