Re: [DESIGN] ParallelAppend

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-07-29 01:44:59
Message-ID: 55B8301B.80407@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


KaiGai-san,

On 2015-07-28 PM 09:58, Kouhei Kaigai wrote:
>>
>> From my understanding of parallel seqscan patch, each worker's
>> PartialSeqScan asks for a block to scan using a shared parallel heap scan
>> descriptor that effectively keeps track of division of work among
>> PartialSeqScans in terms of blocks. What if we invent a PartialAppend
>> which each worker would run in case of a parallelized Append. It would use
>> some kind of shared descriptor to pick a relation (Append member) to scan.
>> The shared structure could be the list of subplans including the mutex for
>> concurrency. It doesn't sound as effective as proposed
>> ParallelHeapScanDescData does for PartialSeqScan but any more granular
>> might be complicated. For example, consider (current_relation,
>> current_block) pair. If there are more workers than subplans/partitions,
>> then multiple workers might start working on the same relation after a
>> round-robin assignment of relations (but of course, a later worker would
>> start scanning from a later block in the same relation). I imagine that
>> might help with parallelism across volumes if that's the case.
>>
> I initially thought ParallelAppend kicks fixed number of background workers
> towards sub-plans, according to the estimated cost on the planning stage.
> However, I'm now inclined that background worker picks up an uncompleted
> PlannedStmt first. (For more details, please see the reply to Amit Kapila)
> It looks like less less-grained worker's job distribution.
> Once number of workers gets larger than number of volumes / partitions,
> it means more than two workers begin to assign same PartialSeqScan, thus
> it takes fine-grained job distribution using shared parallel heap scan.
>

I like your idea of using round-robin assignment of partial/non-partial
sub-plans to workers. Do you think there are two considerations of cost
here: sub-plans themselves could have parallel paths to consider and (I
think) your proposal introduces a new consideration - a plain old
synchronous Append path vs. parallel asynchronous Append with Funnel
(below/above?) it. I guess the asynchronous version would always be
cheaper. So, even if we end up with non-parallel sub-plans do we still add
a Funnel to make Append asynchronous? Am I missing something?

>> MergeAppend
>> parallelization might involve a bit more complication but may be feasible
>> with a PartialMergeAppend with slightly different kind of coordination
>> among workers. What do you think of such an approach?
>>
> Do we need to have something special in ParallelMergeAppend?
> If individual child nodes are designed to return sorted results,
> what we have to do seems to me same.
>

Sorry, I was wrongly worried because I did not really know that
MergeAppend uses a binaryheap to store tuples before returning.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2015-07-29 01:45:08 Re: [PATCH] Reload SSL certificates on SIGHUP
Previous Message Joe Conway 2015-07-29 01:36:45 Re: more RLS oversights