From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>, pgsql-advocacy <pgsql-advocacy(at)postgresql(dot)org> |
Subject: | Re: 9.6 -> 10.0 |
Date: | 2016-04-05 21:33:59 |
Message-ID: | CA+TgmoZBiGvnQrjN7+KKseo1cRtpgJ0EkXSyNiNMYw1SbygAFQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy |
On Tue, Apr 5, 2016 at 10:25 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 22 March 2016 at 20:45, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> While having parallelism is awesome, it's only going to affect a
>>> (arguably small or big depending on your viewpoint) subset of users. It's
>>> going to be massive for those users, but it's not going to be useful for
>>> anywhere near as many users as streaming replication+hot standby+pg_upgrade
>>> in 9.0, or pitr+windows in 8.0. And yes, the vacuum freeze thing is also
>>> going to be great - for a small subset of users (yes, those users are in a
>>> lot of pain now).
>>
>> We don't yet have full parallel query, we only have parallel scan and
>> parallel aggregation.
>
> My comment here missed the point that parallel hash join is also now
> possible for small hash tables, so we at least have a useful subset of
> functionality across parallel scan/join/agg.
Not sure if this matters to you, but nested loops with an inner index
scan also work. The thing we don't support in parallel yet is merge
joins.
The reason for that is that, while it's pretty obvious how to
parallelize a hash join or nested loop - just have each process handle
some of the rows - it's really unclear what it means to do a merge
join in parallel. In fact, you basically can't; it's an inherently
serial algorithm. My understanding of the literature in this area is
that the trick used by other systems is basically to do a bunch of
small merge joins instead of one big one. For example, if you have
two compatibly partitioned tables, you can merge join each pair of
partitions instead of merge-joining the appendrels. Then you get a
bunch of small merge joins that can be scheduled across as many
workers as you have. Once we have declarative partitioning, this
should be doable.
There's more that can be done: given two tables partitioned
incompatibly, or one partitioned table and one unpartitioned table,
the literature talks about doing an on-the-fly repartitioning of one
of the tables to match the existing partitioning scheme of the other,
after which you again have N separate merge joins that you can
schedule across your pool of workers. What's not clear to me is
whether trying to get this sort of thing working is the best use of
developer time. At least in the short term, I think there are other
parallel query limitations more in need of being lifted.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2016-04-05 21:52:45 | Re: 9.6 -> 10.0 |
Previous Message | Joshua D. Drake | 2016-04-05 18:18:03 | PgConf.US partners with TechieYouth for annual charity auction |