Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-04-24 12:32:34
Message-ID: CAA4eK1Jjv+4hw5ZYFK+-8ptT4nMWXdu2Y1-eBdBcdw4Nm_3yeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 23, 2015 at 2:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 22, 2015 at 8:48 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > I have implemented this idea (note that I have to expose a new API
> > shm_mq_from_handle as TupleQueueFunnel stores shm_mq_handle* and
> > we sum_mq* to call shm_mq_detach) and apart this I have fixed other
> > problems reported on this thread:
> >
> > 1. Execution of initPlan by master backend and then pass the
> > required PARAM_EXEC parameter values to workers.
> > 2. Avoid consuming dsm's by freeing the parallel context after
> > the last tuple is fetched.
> > 3. Allow execution of Result node in worker backend as that can
> > be added as a gating filter on top of PartialSeqScan.
> > 4. Merged parallel heap scan descriptor patch
> >
> > To apply the patch, please follow below sequence:
> >
> > HEAD Commit-Id: 4d930eee
> > parallel-mode-v9.patch [1]
> > assess-parallel-safety-v4.patch [2] (don't forget to run fixpgproc.pl
in
> > the patch)
> > parallel_seqscan_v14.patch (Attached with this mail)
>
> Thanks, this version looks like an improvement. However, I still see
> some problems:
>
> - I believe the separation of concerns between ExecFunnel() and
> ExecEndFunnel() is not quite right. If the scan is shut down before
> it runs to completion (e.g. because of LIMIT), then I think we'll call
> ExecEndFunnel() before ExecFunnel() hits the TupIsNull(slot) path. I
> think you probably need to create a static subroutine that is called
> both as soon as TupIsNull(slot) and also from ExecEndFunnel(), in each
> case cleaning up whatever resources remain.
>

Right, will fix as per suggestion.

> - InitializeParallelWorkers() still mixes together general parallel
> executor concerns with concerns specific to parallel sequential scan
> (e.g. EstimatePartialSeqScanSpace).

Here we are doing 2 things, first one is for planned statement and
then second one is node specific which in the case is parallelheapscan
descriptor. So If I understand correctly, you want that we remove second
one and have a recursive function to achieve the same.

>
> - shm_mq_from_handle() is probably reasonable, but can we rename it
> shm_mq_get_queue()?
>

Okay, will change.

> - It's hard to believe this is right:
>
> + if (parallelstmt->inst_options)
> + receiver = None_Receiver;
>
> Really? Flush the tuples if there are *any instrumentation options
> whatsoever*? At the very least, that doesn't look too future-proof,
> but I'm suspicious that it's outright incorrect.
>

instrumentation info is for explain statement where we don't need
tuples and it is set same way for it as well, refer ExplainOnePlan().
What makes you feel this is incorrect?

> - I think ParallelStmt probably shouldn't be defined in parsenodes.h.
> That file is included in a lot of places, and adding all of those
> extra #includes there doesn't seem like a good idea for modularity
> reasons even if you don't care about partial rebuilds. Something that
> includes a shm_mq obviously isn't a "parse" node in any meaningful
> sense anyway.
>

How about tcop/tcopprot.h?

> - I don't think you need both setup cost and startup cost. Starting
> up more workers isn't particularly more expensive than starting up
> fewer of them, because most of the overhead is in waiting for them to
> actually start, and the number of workers is reasonable, then they're
> all be doing that in parallel with each other. I suggest removing
> parallel_startup_cost and keeping parallel_setup_cost.
>

There is some work (like creation of shm queues, launching of workers)
which is done proportional to number of workers during setup time. I
have kept 2 parameters to distinguish such work. I think you have a
point that start of some or all workers could be parallel, but I feel
that still is a work proportinal to number of workers. For future
parallel operations also such a parameter could be useful where we need
to setup IPC between workers or some other stuff where work is proportional
to workers.

> - In cost_funnel(), I don't think it's right to divide the run cost by
> nWorkers + 1. Suppose we've got a plan that looks like this:
>
> Funnel
> -> Hash Join
> -> Partial Seq Scan on a
> -> Hash
> -> Seq Scan on b
>
> The sequential scan on b is going to get executed once per worker,
> whereas the effort for the sequential scan on a is going to be divided
> over all the workers. So the right way to cost this is as follows:
>
> (a) The cost of the partial sequential scan on a is equal to the cost
> of a regular sequential scan, plus a little bit of overhead to account
> for communication via the ParallelHeapScanDesc, divided by the number
> of workers + 1.
> (b) The cost of the remaining nodes under the funnel works normally.
> (c) The cost of the funnel is equal to the cost of the hash join plus
> number of tuples multiplied by per-tuple communication overhead plus a
> large fixed overhead reflecting the time it takes the workers to
> start.
>

IIUC, the change for this would be to remove the change related to
run cost (divide the run cost by nWorkers + 1) from cost_funnel
and made similar change as suggested by point (a) in cost calculation
of partial sequence scan. As of now, we don't do anything which can
move Funnel node on top of hash join, so not sure if you are expecting
any extra handling as part of point (b) or (c).

> - While create_parallelscan_paths() is quite right to limit the number
> of workers to no more than the number of pages, it's pretty obvious
> that in practice that's way too conservative. I suggest we get
> significantly more aggressive about that, like limiting ourselves to
> one worker per thousand pages. We don't really know exactly what the
> costing factors should be here just yet, but we certainly know that
> spinning up lots of workers to read a handful of pages each must be
> dumb. And we can save a significant amount of planning time here by
> not bothering to generate parallel paths for little tiny relations.
>

makes sense, will change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sandeep Thakkar 2015-04-24 12:41:37 Re: anole - test case sha2 fails on all branches
Previous Message Andres Freund 2015-04-24 12:32:29 Re: Replication identifiers, take 4