Re: Performance of "distinct with limit"

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Klaudie Willis <Klaudie(dot)Willis(at)protonmail(dot)com>
Cc: "luis(dot)roberto(at)siscobra(dot)com(dot)br" <luis(dot)roberto(at)siscobra(dot)com(dot)br>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance of "distinct with limit"
Date: 2020-08-29 16:23:26
Message-ID: CAMkU=1wK5t80RxVSLYEz-4e+wnHL0Tkn6DJWsgL19dnnvgv=Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 28, 2020 at 8:34 AM Klaudie Willis <
Klaudie(dot)Willis(at)protonmail(dot)com> wrote:

> No index on n, no. Index might solve it yes, but it seems to me such a
> trivial optimization even without. Obviously it is not.
>
> QUERY
> PLAN
> |
>
> ----------------------------------------------------------------------------------|
> Limit (cost=1911272.10..1911272.12 rows=2
> width=7) |
> -> HashAggregate (cost=1911272.10..1911282.45 rows=1035
> width=7) |
> Group Key:
> cfi |
> -> Seq Scan on bigtable (cost=0.00..1817446.08 rows=37530408
> width=7)|
>
>
I think it would be nice if the LIMIT functionality could be pushed down
into the HashAgg so it could stop early, I've run into this a few times.
But it just isn't implemented. It wouldn't be the hardest feature to ever
add to PostgreSQL, but it also wouldn't be trivial. It would require
coordinated changes both to the planner and to the executor.

Also, the use of LIMIT without an ORDER BY makes the query
non-deterministic, which makes it kind of a second-class citizen. There
might be more enthusiasm among experienced developers for implementing this
if it weren't for that. (Although there may be related deterministic cases
in which a similar limited hash agg could be useful.)

In the meantime, an index on "n" would probably cause it to switch to a
Unique plan which reads in index order. This plan does get to stop early.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Klaudie Willis 2020-08-29 16:39:00 Re: Performance of "distinct with limit"
Previous Message Tom Lane 2020-08-29 15:19:53 Re: How to properly query lots of rows based on timestamps?