Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-03-15 17:01:11
Message-ID: CAKU4AWqjkab8uV18r2WRvasyJTT2=khwLP=ynUV7MePQ1QUjXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All:

I have re-implemented the patch based on David's suggestion/code, Looks it
works well. The updated patch mainly includes:

1. Maintain the not_null_colno in RelOptInfo, which includes the not null
from
catalog and the not null from vars.
2. Add the restictinfo check at populate_baserel_uniquekeys. If we are sure
about only 1 row returned, I add each expr in rel->reltarget->expr as a
unique key.
like (select a, b, c from t where pk = 1), the uk will be ( (a), (b),
(c) )
3. postpone the propagate_unique_keys_to_joinrel call to
populate_joinrel_with_paths
since we know jointype at that time. so we can handle the semi/anti join
specially.
4. Add the rule I suggested above, if both of the 2 relation yields the a
unique result,
the join result will be unique as well. the UK can be ( (rel1_uk1,
rel1_uk2).. )
5. If the unique key is impossible to be referenced by others, we can
safely ignore
it in order to keep the (join)rel->unqiuekeys short.
6. I only consider the not null check/opfamily check for the uniquekey
which comes
from UniqueIndex. I think that should be correct.
7. I defined each uniquekey as List of Expr, so I didn't introduce new
node type.
8. checked the uniquekeys's information before create_distinct_paths and
create_group_paths ignore the new paths to be created if the
sortgroupclauses
is unique already or else create it and add the new uniquekey to the
distinctrel/grouprel.

There are some things I still be in-progress, like:
1. Partition table.
2. union/union all
3. maybe refactor the is_innerrel_unqiue_for/query_is_distinct_for to use
UniqueKey
4. if we are sure the groupby clause is unique, and we have aggregation
call, maybe we
should try Bapat's suggestion, we can use sort rather than hash. The
strategy sounds
awesome, but I didn't check the details so far.
5. more clearer commit message.
6. any more ?

Any feedback is welcome, Thanks for you for your any ideas, suggestions,
demo code!

Best Regards
Andy Fan

Attachment Content-Type Size
v4-0001-Patch-Bypass-distinctClause-groupbyClause-if-the-.patch application/octet-stream 45.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-03-15 17:15:02 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Previous Message Tom Lane 2020-03-15 16:48:09 Re: proposal: new polymorphic types - commontype and commontypearray