From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel INSERT (INTO ... SELECT ...) |
Date: | 2020-10-09 07:30:40 |
Message-ID: | CA+hUKG+ZjyAy2Z-fjz4QFdXauoaLVc-Z2gg5p2vtP2shUeHgHA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 9, 2020 at 3:48 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> It does give me the incentive to look beyond that issue and see
> whether parallel Update and parallel Delete are indeed possible. I'll
> be sure to give it a go!
Cool!
A couple more observations:
+ pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;
Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
If you change it to assign false always, it works just the same,
except EXPLAIN says:
Gather (cost=15428.00..16101.14 rows=1000000 width=4)
Workers Planned: 2
-> Insert on s (cost=15428.00..16101.14 rows=208334 width=4)
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4)
... instead of:
Gather (cost=15428.00..16101.14 rows=1000000 width=4)
Workers Planned: 2
-> Parallel Insert on s (cost=15428.00..16101.14 rows=208334 width=4)
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4)
AFAICS it's not parallel-aware, it just happens to be running in
parallel with a partial input and partial output (and in this case,
effect in terms of writes). Parallel-aware is our term for nodes that
actually know they are running in parallel and do some special
coordination with their twins in other processes.
The estimated row count also looks wrong; at a guess, the parallel
divisor is applied twice. Let me try that with
parallel_leader_particiation=off (which disables some funky maths in
the row estimation and makes it straight division by number of
processes):
Gather (cost=17629.00..18645.50 rows=1000000 width=4)
Workers Planned: 2
-> Insert on s (cost=17629.00..18645.50 rows=250000 width=4)
-> Parallel Hash Join (cost=17629.00..37291.00 rows=500000 width=4)
[more nodes omitted]
Yeah, that was a join that spat out a million rows, and we correctly
estimated 500k per process, and then Insert (still with my hack to
turn off the bogus "Parallel" display in this case, but it doesn't
affect the estimation) estimated 250k per process, which is wrong.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2020-10-09 07:31:33 | Re: dynamic result sets support in extended query protocol |
Previous Message | Peter Eisentraut | 2020-10-09 07:28:34 | abstract Unix-domain sockets |