Re: Allow DISTINCT to use Incremental Sort

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow DISTINCT to use Incremental Sort
Date: 2023-01-10 02:13:43
Message-ID: CAApHDvqgejygRzqP1Scpdvor+aWye8qeyO3nA3YKBAeV7XZvGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

On Tue, 10 Jan 2023 at 02:28, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> +1 for the changes. A minor comment is that previously on HEAD for
> SELECT DISTINCT case, if we have to do an explicit full sort atop the
> cheapest path, we try to make sure to always use the more rigorous
> ordering.

I'm not quite sure I follow what's changed here. As far as I see it
the code still does what it did and uses the more rigorous sort.

postgres=# explain (costs off) select distinct on (relname) * from
pg_Class order by relname, oid;
QUERY PLAN
----------------------------------
Unique
-> Sort
Sort Key: relname, oid
-> Seq Scan on pg_class

If it didn't, then there'd have been a Sort atop of the Unique to
ORDER BY relname,oid in the above.

Maybe you were looking at create_partial_distinct_paths()? We don't do
anything there for DISTINCT ON, although perhaps we could. Just not
for this patch.

>
> /* For explicit-sort case, always use the more rigorous clause */
> if (list_length(root->distinct_pathkeys) <
> list_length(root->sort_pathkeys))
> {
> needed_pathkeys = root->sort_pathkeys;
> /* Assert checks that parser didn't mess up... */
> Assert(pathkeys_contained_in(root->distinct_pathkeys,
> needed_pathkeys));
> }
> else
> needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.

I think that's true for standard DISTINCT, but it's not for DISTINCT ON.

> In the patch this code has been removed. I think we should also remove
> the related comments in create_final_distinct_paths.
>
> * When we have DISTINCT ON, we must sort by the more rigorous of
> * DISTINCT and ORDER BY, else it won't have the desired behavior.
> - * Also, if we do have to do an explicit sort, we might as well use
> - * the more rigorous ordering to avoid a second sort later. (Note
> - * that the parser will have ensured that one clause is a prefix of
> - * the other.)

I'm not quite following what you think has changed here.

> Also, the comment just above this one is outdated too.
>
> * First, if we have any adequately-presorted paths, just stick a
> * Unique node on those. Then consider doing an explicit sort of the
> * cheapest input path and Unique'ing that.
>
> The two-step workflow is what is the case on HEAD but not any more in
> the patch. And I think we should mention incremental sort on any paths
> with presorted keys.

Yeah, I agree, that comment should mention incremental sort.

I've attached an updated patch

David

Attachment Content-Type Size
incremental_sort_for_distinct_v2.patch text/plain 15.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2023-01-10 02:20:25 Re: Schema variables - new implementation for Postgres 15 (typo)
Previous Message Andres Freund 2023-01-10 02:07:49 Re: refactoring relation extension and BufferAlloc(), faster COPY