Re: [PATCH] Lazy hashaggregate when no aggregation is needed

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Francois Deliege <fdeliege(at)gmail(dot)com>
Subject: Re: [PATCH] Lazy hashaggregate when no aggregation is needed
Date: 2012-06-18 18:11:54
Message-ID: CA+TgmoanPggSnpMgA5J15EQjm-+7xdq6FaFwkcnaRSXT8A7uAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> Exactly. I think the first question for this patch should be whether
> this use-case is worth the complexity of the patch. I can't imagine
> any really compelling use cases that need an arbitrary distinct subset
> of results.

Me neither.

> The original complaint on -performance [1], didn't specify
> a real world use case, but it seemed to be a case of an ORM generating
> suboptimal queries. On the other hand, the patch itself is in my
> opinion rather simple, so it might be worth it.

Yeah.

> It has one outstanding issue, query_planner chooses the cheapest path
> based on total cost. This can be suboptimal when that path happens to
> have high startup cost. It seems to me that enabling the query_planner
> to find the cheapest unsorted path returning a limited amount of
> tuples would require some major surgery to the planner. To be clear,
> this is only a case of missed optimization, not a regression.

I'm confused by this remark, because surely the query planner does it
this way only if there's no LIMIT. When there is a LIMIT, we choose
based on the startup cost plus the estimated fraction of the total
cost we expect to pay based on dividing the LIMIT by the overall row
count estimate. Or is this not what you're talking about?

> It won't help set returning functions because the tuplestore for those
> is fully materialized when the first row is fetched.

That's surely a problem for another day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-06-18 18:13:12 Re: WAL format changes
Previous Message Heikki Linnakangas 2012-06-18 18:08:14 Re: WAL format changes