Quick Links

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject:	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date:	2017-12-31 16:29:02
Message-ID:	CAH2-Wzmp3kPCcd2boZiYD6y5T2nikE-r2Utvpj_k33yrSoTsfA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 12, 2017 at 2:09 AM, Rushabh Lathia
<rushabh(dot)lathia(at)gmail(dot)com> wrote:
>> I now believe that index_create() should reject catalog parallel
>> CREATE INDEX directly, just as it does for catalog CREATE INDEX
>> CONCURRENTLY. That logic should be generic to all AMs, since the
>> reasons for disallowing catalog parallel index builds are generic.
>>
>
> Sorry I didn't get this, reject means? you mean it should throw an error
> catalog parallel CREATE INDEX? or just suggesting to set the
> ParallelWorkers and may be LeaderAsWorker from index_create()
> or may be index_build()?

I mean that we should be careful to make sure that AM-generic parallel
CREATE INDEX logic does not end up in a specific AM (nbtree).

The patch *already* refuses to perform a parallel CREATE INDEX on a
system catalog, which is what I meant by reject (sorry for being
unclear). The point is that that's due to a restriction that has
nothing to do with nbtree in particular (just like the CIC restriction
on catalogs), so it should be performed within index_build(). Just
like the similar CONCURRENTLY-on-a-catalog restriction, though without
throwing an error, since of course the user doesn't explicitly ask for
a parallel CREATE INDEX at any point (unlike CONCURRENTLY).

Once we go this way, the cost model has to be called at that point,
too. We already have the AM-specific "OldIndex->rd_rel->relam ==
BTREE_AM_OID" tests within cluster.c, even though theoretically
another AM might be involved with CLUSTER in the future, which this
seems similar to.

So, I propose the following (this is a rough outline):

* Add new IndexInfo files after ii_Concurrent/ii_BrokenHotChain --
ii_ParallelWorkers and ii_LeaderAsWorker.

* Call plan_create_index_workers() within index_create(), assigning to
ii_ParallelWorkers, and fill in ii_LeaderAsWorker from the
parallel_leader_participation GUC. Add comments along the lines of
"only nbtree supports parallel builds". Test the index with a
"heapRelation->rd_rel->relam == BTREE_AM_OID" to make this work.
Otherwise, assign zero to ii_ParallelWorkers (and leave
ii_LeaderAsWorker as false).

* For builds on catalogs, or builds using other AMs, don't let
parallelism go ahead by immediately assigning zero to
ii_ParallelWorkers within index_create(), near where the similar CIC
test occurs already.

What do you think of that?

>> Do you think that the main part of the cost model needs to care about
>> parallel_leader_participation, too?
>>
>> compute_parallel_worker() assumes that the caller is planning a
>> parallel-sequential-scan-alike thing, in the sense that the leader
>> only acts like a worker in cases that probably don't have many
>> workers, where the leader cannot keep itself busy as a leader. That's
>> actually quite different to parallel CREATE INDEX, because the
>> leader-as-worker state will behave in exactly the same way as a worker
>> would, no matter how many workers there are. The leader process is
>> guaranteed to give its full attention to being a worker, because it
>> has precisely nothing else to do until workers finish. This makes me
>> think that we may need to immediately do something with the result of
>> compute_parallel_worker(), to consider whether or not a
>> leader-as-worker state should be used, despite the fact that no
>> existing compute_parallel_worker() caller does anything like this.
>>
>
> I agree with you. compute_parallel_worker() mainly design for the
> scan-alike things. Where as parallel create index is different in a
> sense where leader has as much power as worker. But at the same
> time I don't see any side effect or negative of that with PARALLEL
> CREATE INDEX. So I am more towards not changing that aleast
> for now - as part of this patch.

I've also noticed is that there is little to no negative effect on
CREATE INDEX duration from adding new workers past the point where
adding more workers stops making the build faster. It's quite clear.
And, in general, there isn't all that much theoretical justification
for the cost model (it's essentially the same as any other parallel
scan), which doesn't seem to matter much. So, I agree that it doesn't
really matter in practice, but disagree that it should not still be
changed -- the justification may be a little thin, but I think that we
need to stick to it. There should be a theoretical justification for
the cost model that is coherent in the wider context of costs models
for parallelism in general. It should not be arbitrarily inconsistent
just because it apparently doesn't matter that much. It's easy to fix
-- let's just fix it.

--
Peter Geoghegan

In response to

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) at 2017-12-12 10:09:29 from Rushabh Lathia

Responses

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) at 2018-01-02 09:38:14 from Rushabh Lathia

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2017-12-31 16:57:02	Re: [PATCH] GET DIAGNOSTICS FUNCTION_NAME
Previous Message	Tatsuo Ishii	2017-12-31 15:33:13	Re: Fix a Oracle-compatible　instr function　in the documentation