Re: UniqueKey v2

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject: Re: UniqueKey v2
Date: 2024-06-04 14:14:19
Message-ID: 21674.1717510459@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> Antonin Houska <ah(at)cybertec(dot)at> writes:
>
> >> Could you make the reason clearer for adding 'List *opfamily_lists;'
> >> into UniqueKey? You said "This is needed to create ECs in the parent
> >> query if the upper relation represents a subquery." but I didn't get the
> >> it. Since we need to maintain the UniqueKey in the many places, I'd like
> >> to keep it as simple as possbile. Of course, anything essentical should
> >> be added for sure.
> >
> > If unique keys are generated for a subquery output, they also need to be
> > created for the corresponding relation in the upper query ("sub" in the
> > following example):
>
> OK.
> >
> > select * from tab1 left join (select * from tab2) sub;
> >
> > However, to create an unique key for "sub", you need an EC for each expression
> > of the key.
>
> OK.
> > And to create an EC, you in turn need the list of operator
> > families.
>
> I'm thinking if we need to "create" any EC. Can you find out a user case
> where the outer EC is missed and the UniqueKey is still interesting? I
> don't have an example now.
>
> convert_subquery_pathkeys has a similar sistuation and has the following
> codes:
>
> outer_ec =
> get_eclass_for_sort_expr(root,
> (Expr *) outer_var,
> sub_eclass->ec_opfamilies,
> sub_member->em_datatype,
> sub_eclass->ec_collation,
> 0,
> rel->relids,
> NULL,
> false);
>
> /*
> * If we don't find a matching EC, sub-pathkey isn't
> * interesting to the outer query
> */
> if (outer_ec)
> best_pathkey =
> make_canonical_pathkey(root,
> outer_ec,
> sub_pathkey->pk_opfamily,
> sub_pathkey->pk_strategy,
> sub_pathkey->pk_nulls_first);
> }

I think that convert_subquery_pathkeys() just does not try that hard to
achieve its goal.

The example where it's important to create the EC in the outer query is what I
added to the subselect.sql regression test in the 0004- diff in [1]:

create table tabx as select * from generate_series(1,100) idx;
create table taby as select * from generate_series(1,100) idy;
create unique index on taby using btree (idy);
create view view_barrier with (security_barrier=true) as select * from taby;
analyze tabx, taby;
explain (costs off, verbose on) select * from tabx x left join view_barrier y on idy = idx;

If you modify find_ec_position_matching_expr() to return -1 instead of
creating the EC, you will get this plan

Hash Left Join
Output: x.idx, taby.idy
Hash Cond: (x.idx = taby.idy)
-> Seq Scan on public.tabx x
Output: x.idx
-> Hash
Output: taby.idy
-> Seq Scan on public.taby
Output: taby.idy

instead of this

Hash Left Join
Output: x.idx, taby.idy
Inner Unique: true
Hash Cond: (x.idx = taby.idy)
-> Seq Scan on public.tabx x
Output: x.idx
-> Hash
Output: taby.idy
-> Seq Scan on public.taby
Output: taby.idy

> >> > * uniquekey_useful_for_merging()
> >> >
> >> > How does uniqueness relate to merge join? In README.uniquekey you seem to
> >> > point out that a single row is always sorted, but I don't think this
> >> > function is related to that fact. (Instead, I'd expect that pathkeys are
> >> > added to all paths for a single-row relation, but I'm not sure you do that
> >> > in the current version of the patch.)
> >>
> >> The merging is for "mergejoinable join clauses", see function
> >> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";
> >
> > My question is: why is the uniqueness important specifically to merge join? I
> > understand that join evaluation can be more efficient if we know that one
> > input relation is unique (i.e. we only scan that relation until we find the
> > first match), but this is not specific to merge join.
>
> So the answer is the "merging" in uniquekey_useful_for_merging() has
> nothing with merge join.

I don't understand. The function comment does mention merge join:

/*
* uniquekey_useful_for_merging
* Check if the uniquekey is useful for mergejoins above the given relation.
*
* similar with pathkeys_useful_for_merging.
*/
static bool
uniquekey_useful_for_merging(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *rel)

[1] https://www.postgresql.org/message-id/7971.1713526758%40antos

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-06-04 14:16:36 Re: DROP OWNED BY fails to clean out pg_init_privs grants
Previous Message Ashutosh Bapat 2024-06-04 14:09:46 Re: Logical Replication of sequences