From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Markus Wanner <markus(at)bluegap(dot)ch> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Yuri Levinsky <yuril(at)celltick(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash partitioning. |
Date: | 2013-06-27 21:13:02 |
Message-ID: | CAMkU=1wsR3DvvsxmAsoiQAHGnW+_UFETcQsig-MP6JL9cK098Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> > We could also allow a large query to search a single table in parallel.
> > A seqscan would be easy to divide into N equally-sized parts that can be
> > scanned in parallel. It's more difficult for index scans, but even then
> > it might be possible at least in some limited cases.
>
> So far reading sequentially is still faster than hopping between
> different locations. Purely from the I/O perspective, that is.
>
Wouldn't any IO system being used on a high-end system be fairly good about
making this work through interleaved read-ahead algorithms? Also,
hopefully the planner would be able to predict when parallelization has
nothing to add and avoid using it, although surely that is easier said than
done.
>
> For queries where the single CPU core turns into a bottle-neck and which
> we want to parallelize, we should ideally still do a normal, fully
> sequential scan and only fan out after the scan and distribute the
> incoming pages (or even tuples) to the multiple cores to process.
>
That sounds like it would be much more susceptible to lock contention, and
harder to get bug-free, than dividing into bigger chunks, like whole 1 gig
segments.
Fanning out line by line (according to line_number % number_processes) was
my favorite parallelization method in Perl, but those files were read only
and so had no concurrency issues.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2013-06-27 21:20:51 | Re: Hash partitioning. |
Previous Message | Josh Berkus | 2013-06-27 21:12:29 | Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review] |