From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, rushabh(dot)lathia(at)gmail(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com> |
Subject: | Re: [PATCH] Keeps tracking the uniqueness with UniqueKey |
Date: | 2020-05-13 22:19:59 |
Message-ID: | CAApHDvrksJ1K45A6nhE_8h-g=8nFK0PU84Q52KK3Cmm9aezWkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 14 May 2020 at 03:48, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>> My impression about the one row stuff, is that there is too much
>> special casing around it. We should somehow structure the UniqueKey
>> data so that one row unique keys come naturally rather than special
>> cased. E.g every column in such a case is unique in the result so
>> create as many UniqueKeys are the number of columns
>
>
> This is the beginning state of the UniqueKey, later David suggested
> this as an optimization[1], I buy-in the idea and later I found it mean
> more than the original one [2], so I think onerow is needed actually.
Having the "onerow" flag was not how I intended it to work.
Here's an example of how I thought it should work:
Assume t1 has UniqueKeys on {a}
SELECT DISTINCT a,b FROM t1;
Here the DISTINCT can be a no-op due to "a" being unique within t1. Or
more basically, {a} is a subset of {a,b}.
The code which does this is relation_has_uniquekeys_for(), which
contains the code:
+ if (list_is_subset(ukey->exprs, exprs))
+ return true;
In this case, ukey->exprs is {a} and exprs is {a,b}. So, if the
UniqueKey's exprs are a subset of, in this case, the DISTINCT exprs
then relation_has_uniquekeys_for() returns true. Basically
list_is_subset({a}, {a,b}), Answer: "Yes".
For the onerow stuff, if we can prove the relation returns only a
single row, e.g an aggregate without a GROUP BY, or there are
EquivalenceClasses with ec_has_const == true for each key of a unique
index, then why can't set just set the UniqueKeys to {}? That would
mean the code to determine if we can avoid performing an explicit
DISTINCT operation would be called with list_is_subset({}, {a,b}),
which is also true, in fact, an empty set is a subset of any set. Why
is there a need to special case that fact?
In light of those thoughts, can you explain why you think we need to
keep the onerow flag?
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-05-13 22:24:23 | Re: pgstat_read_statsfiles() and reset timestamp |
Previous Message | Alvaro Herrera | 2020-05-13 22:10:51 | Re: new heapcheck contrib module |