Quick Links

Re: Parallel Seq Scan

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Seq Scan
Date:	2015-02-12 11:07:58
Message-ID:	CAA4eK1K==zHzNUAjQzuW-MY7GcrV5vLMiJKWQW-zAxe3J_r4mA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 12, 2015 at 2:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Feb 10, 2015 at 3:56 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > On 2015-02-10 09:23:02 -0500, Robert Haas wrote:
> >> On Tue, Feb 10, 2015 at 9:08 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> >
> > As pointed out above (moved there after reading the patch...) I don't
> > think a chunk size of 1 or any other constant size can make sense. I
> > don't even believe it'll necessarily be constant across an entire query
> > execution (big initially, small at the end). Now, we could move
> > determining that before the query execution into executor
> > initialization, but then we won't yet know how many workers we're going
> > to get. We could add a function setting that at runtime, but that'd mix
> > up responsibilities quite a bit.
>
> I still think this belongs in heapam.c somehow or other. If the logic
> is all in the executor, then it becomes impossible for any code that
> doensn't use the executor to do a parallel heap scan, and that's
> probably bad. It's not hard to imagine something like CLUSTER wanting
> to reuse that code, and that won't be possible if the logic is up in
> some higher layer. If the logic we want is to start with a large
> chunk size and then switch to a small chunk size when there's not much
> of the relation left to scan, there's still no reason that can't be
> encapsulated in heapam.c.
>

It seems to me that we need to use both ways (make heap or other lower
layers aware of parallelism and another one is handle at executor level and
use callback_function and callback_state to make it work) for doing
parallelism. TBH, I think for the matter of this patch we can go either way
and then think more on it as we move ahead to parallelize other operations.
So what I can do is to try using Robert's patch to make heap aware of
parallelism and then see how it comes up?

> > Btw, using a atomic uint32 you'd end up without the spinlock and just
> > about the same amount of code... Just do a atomic_fetch_add_until32(var,
> > 1, InvalidBlockNumber)... ;)
>
> I thought of that, but I think there's an overflow hazard.
>
> > Where the 'coordinated seqscan' scans a relation so that each tuple
> > eventually gets returned once across all nodes, but the nested loop (and
> > through it the index scan) will just run normally, without any
> > coordination and parallelism. But everything below --- would happen
> > multiple nodes. If you agree, yes, then we're in violent agreement
> > ;). The "single node that gets copied" bit above makes me a bit unsure
> > whether we are though.
>
> Yeah, I think we're talking about the same thing.
>
> > To me, given the existing executor code, it seems easiest to achieve
> > that by having the ParallelismDrivingNode above having a dynamic number
> > of nestloop children in different backends and point the coordinated
> > seqscan to some shared state. As you point out, the number of these
> > children cannot be certainly known (just targeted for) at plan time;
> > that puts a certain limit on how independent they are. But since a
> > large number of them can be independent between workers it seems awkward
> > to generally treat them as being the same node across workers. But maybe
> > that's just an issue with my mental model.
>
> I think it makes sense to think of a set of tasks in which workers can
> assist. So you a query tree which is just one query tree, with no
> copies of the nodes, and then there are certain places in that query
> tree where a worker can jump in and assist that node. To do that, it
> will have a copy of the node, but that doesn't mean that all of the
> stuff inside the node becomes shared data at the code level, because
> that would be stupid.
>

As per my understanding of the discussion related to this point, I think
there are 3 somewhat related ways to achieve this.

1. Both master and worker runs the same node (ParallelSeqScan) where
the work done by worker (scan chunks of the heap) for this node is
subset of what is done by master (coordinate the data returned by workers +
scan chunks of heap). It seems to me Robert is advocating this approach.
2. Master and worker uses different nodes to operate. Master runs
parallelism
drivingnode (ParallelSeqscan - coordinate the data returned by workers +
scan chunks of heap ) and worker runs some form of Parallelismdriver
node (PartialSeqScan - scan chunks of the heap). It seems to me
Andres is proposing this approach.
3. Same as 2, but modify existing SeqScan node to behave as
PartialSeqScan. This is what I have done in patch.

Correct me or add here if I have misunderstood any thing.

I think going forward (for cases like aggregation) the work done in
Master and Worker node will have substantial differences that it
is better to do the work as part of different nodes in master and
worker.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Parallel Seq Scan at 2015-02-11 20:49:17 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Syed, Rahila	2015-02-12 11:08:23	Re: [REVIEW] Re: Compression of full-page-writes
Previous Message	Anastasia Lubennikova	2015-02-12 10:40:25	Re: Index-only scans for GiST.