From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Allow DISTINCT to use Incremental Sort |
Date: | 2023-01-10 02:13:43 |
Message-ID: | CAApHDvqgejygRzqP1Scpdvor+aWye8qeyO3nA3YKBAeV7XZvGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for having a look at this.
On Tue, 10 Jan 2023 at 02:28, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> +1 for the changes. A minor comment is that previously on HEAD for
> SELECT DISTINCT case, if we have to do an explicit full sort atop the
> cheapest path, we try to make sure to always use the more rigorous
> ordering.
I'm not quite sure I follow what's changed here. As far as I see it
the code still does what it did and uses the more rigorous sort.
postgres=# explain (costs off) select distinct on (relname) * from
pg_Class order by relname, oid;
QUERY PLAN
----------------------------------
Unique
-> Sort
Sort Key: relname, oid
-> Seq Scan on pg_class
If it didn't, then there'd have been a Sort atop of the Unique to
ORDER BY relname,oid in the above.
Maybe you were looking at create_partial_distinct_paths()? We don't do
anything there for DISTINCT ON, although perhaps we could. Just not
for this patch.
>
> /* For explicit-sort case, always use the more rigorous clause */
> if (list_length(root->distinct_pathkeys) <
> list_length(root->sort_pathkeys))
> {
> needed_pathkeys = root->sort_pathkeys;
> /* Assert checks that parser didn't mess up... */
> Assert(pathkeys_contained_in(root->distinct_pathkeys,
> needed_pathkeys));
> }
> else
> needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.
I think that's true for standard DISTINCT, but it's not for DISTINCT ON.
> In the patch this code has been removed. I think we should also remove
> the related comments in create_final_distinct_paths.
>
> * When we have DISTINCT ON, we must sort by the more rigorous of
> * DISTINCT and ORDER BY, else it won't have the desired behavior.
> - * Also, if we do have to do an explicit sort, we might as well use
> - * the more rigorous ordering to avoid a second sort later. (Note
> - * that the parser will have ensured that one clause is a prefix of
> - * the other.)
I'm not quite following what you think has changed here.
> Also, the comment just above this one is outdated too.
>
> * First, if we have any adequately-presorted paths, just stick a
> * Unique node on those. Then consider doing an explicit sort of the
> * cheapest input path and Unique'ing that.
>
> The two-step workflow is what is the case on HEAD but not any more in
> the patch. And I think we should mention incremental sort on any paths
> with presorted keys.
Yeah, I agree, that comment should mention incremental sort.
I've attached an updated patch
David
Attachment | Content-Type | Size |
---|---|---|
incremental_sort_for_distinct_v2.patch | text/plain | 15.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2023-01-10 02:20:25 | Re: Schema variables - new implementation for Postgres 15 (typo) |
Previous Message | Andres Freund | 2023-01-10 02:07:49 | Re: refactoring relation extension and BufferAlloc(), faster COPY |