Re: UniqueKey v2

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject: Re: UniqueKey v2
Date: 2024-05-13 17:42:15
Message-ID: 4986.1715622135@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> > >
> > > * Combining the UKs
> > >
> > > IMO this is the most problematic part of the patch. You call
> > > populate_joinrel_uniquekeys() for the same join multiple times,
> >
> > Why do you think so? The below code is called in "make_join_rel"
>
> Consider join of tables "a", "b" and "c". My understanding is that
> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
> produce the same set of unique keys.
>
> I need to check this part more in detail.

I think I understand now. By calling populate_joinrel_uniquekeys() for various
orderings, you can find out that various input relation unique keys can
represent the whole join. For example, if the ordering is

A JOIN (B JOIN C)

you can prove that the unique keys of A can be used for the whole join, while
for the ordering

B JOIN (A JOIN C)

you can prove the same for the unique keys of B, and so on.

> > Is your original question is about populate_joinrel_uniquekey_for_rel
> > rather than populate_joinrel_uniquekeys? We have the below codes:
> >
> > outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
> > innerrel, restrictlist);
> > inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel,
> > outerrel, restrictlist);
> >
> > This is mainly because of the following theory. Quoted from
> > README.uniquekey. Let's called this as "rule 1".
> >
> > """
> > How to maintain the uniquekey?
> > -------------------------------
> > .. From the join relation, it is maintained with two rules:
> >
> > - the uniquekey in one side is still unique if it can't be duplicated
> > after the join. for example:
> >
> > SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk;
> > UniqueKey on t1: t1.pk
> > UniqueKey on t1 Join t2: t1.pk
> > """
> >
> > AND the blow codes:
> >
> >
> > if (outeruk_still_valid || inneruk_still_valid)
> >
> > /*
> > * the uniquekey on outers or inners have been added into joinrel so
> > * the combined uniuqekey from both sides is not needed.
> > */
> > return;
> >
> >
> > We don't create the component uniquekey if any one side of the boths
> > sides is unique already. For example:
> >
> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
> > unique", there is no need to create component UniqueKey (t1.id, t2.id);
>
> ok, I need to check more in detail how this part works.

This optimization makes sense to me.

> > >
> > > Of course one problem is that the number of combinations can grow
> > > exponentially as new relations are joined.
> >
> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
> > UniqueKey.README is introduced.

I think there should yet be some guarantee that the number of unique keys does
not grow exponentially. Perhaps a constant that allows a relation (base or
join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
or 4.) And when picking the "best N keys", one criterion could be the number
of expressions in the key (the shorter key the better).

> > >
> > > 2) Check if the join relation is single-row
> > >
> > > I in order to get rid of the dependency on 'restrictlist', I think you can
> > > use ECs. Consider a query from your regression tests:
> > >
> > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int);
> > >
> > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1;
> > >
> > > The idea here seems to be that no more than one row of t1 matches the query
> > > clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures that
> > > no more than one row of t2 matches the query (because t1 cannot provide the
> > > clause with more than one input value of 'e'). And therefore, the join also
> > > produces at most one row.
> >
> > You are correct and IMO my current code are able to tell it is a single
> > row as well.
> >
> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
> > consequence.
> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
> > joinrel is singlerow as well.

ok, I think I understand now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2024-05-13 18:02:05 Re: Fix out-of-bounds in the function GetCommandTagName
Previous Message Jelte Fennema-Nio 2024-05-13 17:41:58 Re: Direct SSL connection with ALPN and HBA rules