Quick Links

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, rushabh(dot)lathia(at)gmail(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject:	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date:	2020-05-13 22:19:59
Message-ID:	CAApHDvrksJ1K45A6nhE_8h-g=8nFK0PU84Q52KK3Cmm9aezWkw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 14 May 2020 at 03:48, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>> My impression about the one row stuff, is that there is too much
>> special casing around it. We should somehow structure the UniqueKey
>> data so that one row unique keys come naturally rather than special
>> cased. E.g every column in such a case is unique in the result so
>> create as many UniqueKeys are the number of columns
>
>
> This is the beginning state of the UniqueKey, later David suggested
> this as an optimization[1], I buy-in the idea and later I found it mean
> more than the original one [2], so I think onerow is needed actually.

Having the "onerow" flag was not how I intended it to work.

Here's an example of how I thought it should work:

Assume t1 has UniqueKeys on {a}

SELECT DISTINCT a,b FROM t1;

Here the DISTINCT can be a no-op due to "a" being unique within t1. Or
more basically, {a} is a subset of {a,b}.

The code which does this is relation_has_uniquekeys_for(), which
contains the code:

+ if (list_is_subset(ukey->exprs, exprs))
+ return true;

In this case, ukey->exprs is {a} and exprs is {a,b}. So, if the
UniqueKey's exprs are a subset of, in this case, the DISTINCT exprs
then relation_has_uniquekeys_for() returns true. Basically
list_is_subset({a}, {a,b}), Answer: "Yes".

For the onerow stuff, if we can prove the relation returns only a
single row, e.g an aggregate without a GROUP BY, or there are
EquivalenceClasses with ec_has_const == true for each key of a unique
index, then why can't set just set the UniqueKeys to {}? That would
mean the code to determine if we can avoid performing an explicit
DISTINCT operation would be called with list_is_subset({}, {a,b}),
which is also true, in fact, an empty set is a subset of any set. Why
is there a need to special case that fact?

In light of those thoughts, can you explain why you think we need to
keep the onerow flag?

David

In response to

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey at 2020-05-13 15:48:25 from Andy Fan

Responses

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey at 2020-05-14 02:38:44 from Andy Fan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-05-13 22:24:23	Re: pgstat_read_statsfiles() and reset timestamp
Previous Message	Alvaro Herrera	2020-05-13 22:10:51	Re: new heapcheck contrib module