Quick Links

Re: [PATCH] Lazy hashaggregate when no aggregation is needed

From:	"Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To:	"'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Ants Aasma'" <ants(at)cybertec(dot)at>
Cc:	"'Jay Levitt'" <jay(dot)levitt(at)gmail(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>, "'Francois Deliege'" <fdeliege(at)gmail(dot)com>
Subject:	Re: [PATCH] Lazy hashaggregate when no aggregation is needed
Date:	2012-06-19 09:41:49
Message-ID:	004a01cd4dff$bffd9a40$3ff8cec0$@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Tuesday, June 19, 2012 3:12 AM
> To: Ants Aasma
> Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
> Deliege
> Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
> needed
>
> On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> > Exactly. I think the first question for this patch should be whether
> > this use-case is worth the complexity of the patch. I can't imagine
> > any really compelling use cases that need an arbitrary distinct subset
> > of results.
>
> Me neither.
>
> > The original complaint on -performance [1], didn't specify a real
> > world use case, but it seemed to be a case of an ORM generating
> > suboptimal queries. On the other hand, the patch itself is in my
> > opinion rather simple, so it might be worth it.
>
> Yeah.
>
> > It has one outstanding issue, query_planner chooses the cheapest path
> > based on total cost. This can be suboptimal when that path happens to
> > have high startup cost. It seems to me that enabling the query_planner
> > to find the cheapest unsorted path returning a limited amount of
> > tuples would require some major surgery to the planner. To be clear,
> > this is only a case of missed optimization, not a regression.
>
> I'm confused by this remark, because surely the query planner does it this
> way only if there's no LIMIT. When there is a LIMIT, we choose based on
> the startup cost plus the estimated fraction of the total cost we expect
> to pay based on dividing the LIMIT by the overall row count estimate. Or
> is this not what you're talking about?

I think that Ants is pointing the way of estimating costs in
choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
cheapest_path + hashagg, where the costs are calculated based on the total
cost only of cheapest_path. I think that it might be good to do cost_agg()
for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
strategy.

> > It won't help set returning functions because the tuplestore for those
> > is fully materialized when the first row is fetched.
>
> That's surely a problem for another day.

Best regards,
Etsuro Fujita

In response to

Re: [PATCH] Lazy hashaggregate when no aggregation is needed at 2012-06-18 18:11:54 from Robert Haas

Responses

Re: [PATCH] Lazy hashaggregate when no aggregation is needed at 2012-06-22 14:12:58 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alex Shulgin	2012-06-19 10:04:56	Re: Libxml2 load error on Windows
Previous Message	Amit Kapila	2012-06-19 08:46:46	Re: Allow WAL information to recover corrupted pg_controldata