Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-02-09 07:31:39
Message-ID: CAA4eK1+oA4yVa9EuVNYbkp_PSZZPGoMtG6D-i5wJiJ-+XnCOoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 7, 2015 at 2:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Feb 6, 2015 at 2:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > The complicated part here seems to me to figure out what we need to
> > pass from the parallel leader to the parallel worker to create enough
> > state for quals and projection. If we want to be able to call
> > ExecScan() without modification, which seems like a good goal, we're
> > going to need a ScanState node, which is going to need to contain
> > valid pointers to (at least) a ProjectionInfo, an ExprContext, and a
> > List of quals. That in turn is going to require an ExecutorState.
> > Serializing those things directly doesn't seem very practical; what we
> > instead want to do is figure out what we can pass that will allow easy
> > reconstruction of those data structures. Right now, you're passing
> > the target list, the qual list, the range table, and the params, but
> > the range table doesn't seem to be getting used anywhere. I wonder if
> > we need it. If we could get away with just passing the target list
> > and qual list, and params, we'd be doing pretty well, I think. But
> > I'm not sure exactly what that looks like.
>
> IndexBuildHeapRangeScan shows how to do qual evaluation with
> relatively little setup:
>

I think even to make quals work, we need to do few extra things
like setup paramlist, rangetable. Also for quals, we need to fix
function id's by calling fix_opfuncids() and do the stuff what
ExecInit*() function does for quals. I think these extra things
will be required in processing of qualification for seq scan.

Then we need to construct projection info from target list (basically
do the stuff what ExecInit*() function does). After constructing
projectioninfo, we can call ExecProject().

Here we need to take care that functions to collect instrumentation
information like InstrStartNode(), InstrStopNode(), InstrCountFiltered1(),
etc. be called at appropriate places, so that we can collect the same for
Explain statement when requested by master backend.

Then finally after sending tuples need to destroy all the execution
state constructed for fetching tuples.

So to make this work, basically we need to do all important work
that executor does in three different phases initialization of
node, execution of node, ending the node. Ideally, we can make this
work by having code specific to just execution of sequiatial scan,
however it seems to me we again need more such kinds of code
(extracted from core part of executor) to make parallel execution of
other functionalaties like aggregation, partition seq scan, etc.

Another idea is to use Executor level interfaces (like ExecutorStart(),
ExecutorRun(), ExecutorEnd()) for execution rather than using Portal
level interfaces. I have used Portal level interfaces with the
thought that we can reuse the existing infrastructure of Portal to
make parallel execution of scrollable cursors, but as per my analysis
it is not so easy to support them especially backward scan, absolute/
relative fetch, etc, so Executor level interfaces seems more appealing
to me (something like how Explain statement works (ExplainOnePlan)).
Using Executor level interfaces will have advantage that we can reuse them
for other parallel functionalaties. In this approach, we need to take
care of constructing relavant structures (with the information passed by
master backend) required for Executor interfaces, but I think these should
be lesser than what we need in previous approach (extract seqscan specific
stuff from executor).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-02-09 07:40:16 Re: Parallel Seq Scan
Previous Message Heikki Linnakangas 2015-02-09 07:19:20 Re: RangeType internal use