Re: Aggregates push-down to partitions

From: Maksim Milyutin <milyutinma(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregates push-down to partitions
Date: 2017-11-09 18:50:04
Message-ID: 73ca1970-d182-0889-4d63-ec369a68d788@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Konstantin!

09.11.17 20:14, Konstantin Knizhnik wrote:
> It is still far from ideal plan because each worker is working with
> all partitions, instead of spitting partitions between workers and
> calculate partial aggregates for each partition.
>
> But if we add FDW as a child of parent table, then parallel scan can
> not be used and we get the worst possible plan:
>
> postgres=# create foreign table derived_fdw() inherits(base) server
> pg_fdw options (table_name 'derived1');CREATE FOREIGN TABLE
> postgres=# explain select sum(x) from base;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>
>  Aggregate  (cost=34055.07..34055.08 rows=1 width=8)
>    ->  Append  (cost=0.00..29047.75 rows=2002926 width=4)
>          ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=4)
>          ->  Seq Scan on derived1  (cost=0.00..14425.00 rows=1000000
> width=4)
>          ->  Seq Scan on derived2  (cost=0.00..14425.00 rows=1000000
> width=4)
>          ->  Foreign Scan on derived_fdw  (cost=100.00..197.75
> rows=2925 width=4)
> (6 rows)
>
> So we sequentially pull all data to this node and compute aggregates
> locally.
> Ideal plan will calculate in parallel partial aggregates at all nodes
> and then combine partial results.
> It requires two changes:
> 1. Replace Aggregate->Append with
> Finalize_Aggregate->Append->Partial_Aggregate
> 2. Concurrent execution of Append. It also can be done in two
> different ways: we can try to use existed parallel workers
> infrastructure and
> replace Append with Gather. It seems to be the best approach for local
> partitioning. In case of remote (FDW) partitions, it is enough
> to split starting of execution (PQsendQuery in postgres_fdw) and
> getting results. So it requires some changes in FDW protocol.
>
>
> I wonder if somebody already investigate this problem or working in
> this direction.
> May be there are already some patches proposed?
> I have searched hackers archive, but didn't find something relevant...
> Are there any suggestions about the best approach to implement this
> feature?
>

Maybe in this thread[1] your described problem are solved through
introducing Parallel Append node?

1.
https://www.postgresql.org/message-id/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1%2BS%2BvRuUQ%40mail.gmail.com

--
Regards,
Maksim Milyutin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-11-09 18:52:08 Re: Simplify ACL handling for large objects and removal of superuser() checks
Previous Message Robert Haas 2017-11-09 18:39:34 Re: Simplify ACL handling for large objects and removal of superuser() checks