Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Floris Van Nee <florisvannee(at)optiver(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "rushabh(dot)lathia(at)gmail(dot)com" <rushabh(dot)lathia(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-12-05 18:40:33
Message-ID: a517310b-61c7-5190-ac22-9d82aac1270e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 05/12/2020 17:10, Andy Fan wrote:
> Actually I can't understand this, could you explain more?  Based on my
> current
> knowledge,  when we run "SELECT DISTINCT a FROM t",  we never care about
> which operator to use for the unique.

SortGroupClause includes 'eqop' field, which determines the operator
that the expression needs to made unique with. The syntax doesn't let
you set it to anything else than the default btree opclass of the
datatype, though. But you can specify it for ORDER BY, and we use
SortGroupClauses to represent both sorting and grouping.

Also, if you use the same struct to also represent columns that you know
to be unique, and not just the DISTINCT clause in the query, then you
need the operator. For example, if you create a unique index on
non-default opfamily.

> There's some precedence for PathKeys, as we generate PathKeys to
> represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
> other hand, I've always found it confusing that we use PathKeys to
> represent DISTINCT and GROUP BY, which are not actually sort orderings.
>
>
> OK, I have the same confusion  now:)
>
> Perhaps it would  make sense to store EquivalenceClass+opfamily in
> UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
> UniqueKeys.
>
>
> I can understand why we need EquivalenceClass for UniqueKey, but I can't
> understand why we need opfamily here.

Thinking a bit harder, I guess we don't. Because EquivalenceClass
includes the operator family already, in the ec_opfamilies field.

> For anyone who is interested with these patchsets, here is my plan
> about this now. 1). I will try EquivalenceClass rather than Expr in
> UniqueKey and add opfamily if needed. 2). I will start a new thread
> to continue this topic. The current thread is too long which may
> scare some people who may have interest in it. 3). I will give up
> patch 5 & 6 for now. one reason I am not happy with the current
> implementation, and the other reason is I want to make the patchset
> smaller to make the reviewer easier. I will not give up them forever,
> after the main part of this patchset is committed, I will continue
> with them in a new thread. Thanks everyone for your input.
Sounds like a plan.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-12-05 19:59:41 Re: should INSERT SELECT use a BulkInsertState?
Previous Message Tom Lane 2020-12-05 18:03:43 Re: Change definitions of bitmap flags to bit-shifting style