Re: parallelism and sorting

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelism and sorting
Date: 2015-11-25 02:23:32
Message-ID: CA+Tgmoak-TnVzeyZ3iG_j-mg-Xo9xYmY5rPmiqy7y=_oSfwGFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2015 at 7:44 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 11/23/15 5:47 PM, Robert Haas wrote:
>> 2. In Parallel Seq Scan, the determination of what page to scan next
>> isn't dependent on the contents of any page previously scanned. In
>> Parallel Index Scan, it is. Therefore, the amount of effective
>> parallelism is likely to be less. This doesn't mean that trying to
>> parallelize things here is worthless: one backend can be fetching the
>> next index page while some other backend is processing the tuples from
>> a page previously read.
>
> Presumably we could simulate that today by asking the kernel for the next
> page in advance, like we do for seqscans when effective_io_concurrency > 1.

We don't do any such thing. We prefetch for bitmap heap scans, not seq scans.

> My guess is a parallel worker won't help there.
> Where a parallel worker might provide a lot of benefit is separating index
> scanning from heap scanning (to check visibility or satisfy a filter). It
> wouldn't surprise me if a single worker reading an index could keep a number
> of children busy retrieving heap tuples and processing them.

Fortunately, the design I'm describing permits that exact thing.

> It might be
> nice if an index scan node just fired up it's own workers and talked to them
> directly.

That would be a bad idea, I'm pretty sure. Passing tuples between
workers is expensive and needs to be minimized. I am quite confident
that the right model for making parallelism better is to push as much
stuff beneath the Gather node as possible - that is, each worker
should have as many different things as possible that it can do
without incurring communication overhead. Single purpose workers that
only assist with one part of the computation and then relay data to
some other process are exactly what we want to avoid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-11-25 02:31:19 Re: Minor comment edits in nodeGather.c
Previous Message Tom Lane 2015-11-25 02:16:31 Re: New email address