From: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Partition-wise aggregation/grouping |
Date: | 2017-11-01 12:50:11 |
Message-ID: | CAM2+6=X8VFpCvq=d2vYMxmTEmeC-Fb-_NwbZgvSALuBrDzU3hA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Oct 28, 2017 at 3:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > 1. Added separate patch for costing Append node as discussed up-front in
> the
> > patch-set.
> > 2. Since we now cost Append node, we don't need
> > partition_wise_agg_cost_factor
> > GUC. So removed that. The remaining patch hence merged into main
> > implementation
> > patch.
> > 3. Updated rows in test-cases so that we will get partition-wise plans.
>
> With 0006 applied, cost_merge_append() is now a little bit confused:
>
> /*
> * Also charge a small amount (arbitrarily set equal to operator cost)
> per
> * extracted tuple. We don't charge cpu_tuple_cost because a
> MergeAppend
> * node doesn't do qual-checking or projection, so it has less overhead
> * than most plan nodes.
> */
> run_cost += cpu_operator_cost * tuples;
>
> /* Add MergeAppend node overhead like we do it for the Append node */
> run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
>
> The first comment says that we don't add cpu_tuple_cost, and the
> second one then adds half of it anyway.
>
Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an Append
node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?
> I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
> because as you say it's used twice, but I don't think that should be
> exposed in cost.h; I'd make it private to costsize.c and rename it to
> something like APPEND_CPU_COST_MULTIPLIER. The word DEFAULT, in
> particular, seems useless to me, since there's no provision for it to
> be overridden by a different value.
>
Agree. Will make that change.
>
> What testing, if any, can we think about doing with this plan to make
> sure it doesn't regress things? For example, if we do a TPC-H run
> with partitioned tables and partition-wise join enabled, will any
> plans change with this patch?
I have tried doing this on my local developer machine. For 1GB database
size (tpc-h scale factor 1), I see no plan change with and without this
patch.
I have tried with scale factor 10, but query is not executing well due to
space and memory constraints. Can someone try out that?
> Do they get faster or not? Anyone have
> other ideas for what to test?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Ildus Kurbangaliev | 2017-11-01 13:24:16 | proposal: extend shm_mq to support more use cases |
Previous Message | Chris Travers | 2017-11-01 12:43:35 | Re: Oracle to PostGre |