From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Seq Scan |
Date: | 2015-02-07 21:30:29 |
Message-ID: | 20150207213029.GG9201@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-02-06 22:57:43 -0500, Robert Haas wrote:
> On Fri, Feb 6, 2015 at 2:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > My first comment here is that I think we should actually teach
> > heapam.c about parallelism.
>
> I coded this up; see attached. I'm also attaching an updated version
> of the parallel count code revised to use this API. It's now called
> "parallel_count" rather than "parallel_dummy" and I removed some
> stupid stuff from it. I'm curious to see what other people think, but
> this seems much cleaner to me. With the old approach, the
> parallel-count code was duplicating some of the guts of heapam.c and
> dropping the rest on the floor; now it just asks for a parallel scan
> and away it goes. Similarly, if your parallel-seqscan patch wanted to
> scan block-by-block rather than splitting the relation into equal
> parts, or if it wanted to participate in the synchronized-seqcan
> stuff, there was no clean way to do that. With this approach, those
> decisions are - as they quite properly should be - isolated within
> heapam.c, rather than creeping into the executor.
I'm not convinced that that reasoning is generally valid. While it may
work out nicely for seqscans - which might be useful enough on its own -
the more stuff we parallelize the *more* the executor will have to know
about it to make it sane. To actually scale nicely e.g. a parallel sort
will have to execute the nodes below it on each backend, instead of
doing that in one as a separate step, ferrying over all tuples to
indivdual backends through queues, and only then parallezing the
sort.
Now. None of that is likely to matter immediately, but I think starting
to build the infrastructure at the points where we'll later need it does
make some sense.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2015-02-07 21:32:58 | Re: Cube extension kNN support |
Previous Message | Robert Haas | 2015-02-07 21:07:45 | perplexing error message |