Quick Links

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc:	Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Floris Van Nee <florisvannee(at)optiver(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "rushabh(dot)lathia(at)gmail(dot)com" <rushabh(dot)lathia(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject:	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date:	2020-12-05 18:40:33
Message-ID:	a517310b-61c7-5190-ac22-9d82aac1270e@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 05/12/2020 17:10, Andy Fan wrote:
> Actually I can't understand this, could you explain more? Based on my
> current
> knowledge, when we run "SELECT DISTINCT a FROM t", we never care about
> which operator to use for the unique.

SortGroupClause includes 'eqop' field, which determines the operator
that the expression needs to made unique with. The syntax doesn't let
you set it to anything else than the default btree opclass of the
datatype, though. But you can specify it for ORDER BY, and we use
SortGroupClauses to represent both sorting and grouping.

Also, if you use the same struct to also represent columns that you know
to be unique, and not just the DISTINCT clause in the query, then you
need the operator. For example, if you create a unique index on
non-default opfamily.

> There's some precedence for PathKeys, as we generate PathKeys to
> represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
> other hand, I've always found it confusing that we use PathKeys to
> represent DISTINCT and GROUP BY, which are not actually sort orderings.
>
>
> OK, I have the same confusion now:)
>
> Perhaps it would make sense to store EquivalenceClass+opfamily in
> UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
> UniqueKeys.
>
>
> I can understand why we need EquivalenceClass for UniqueKey, but I can't
> understand why we need opfamily here.

Thinking a bit harder, I guess we don't. Because EquivalenceClass
includes the operator family already, in the ec_opfamilies field.

> For anyone who is interested with these patchsets, here is my plan
> about this now. 1). I will try EquivalenceClass rather than Expr in
> UniqueKey and add opfamily if needed. 2). I will start a new thread
> to continue this topic. The current thread is too long which may
> scare some people who may have interest in it. 3). I will give up
> patch 5 & 6 for now. one reason I am not happy with the current
> implementation, and the other reason is I want to make the patchset
> smaller to make the reviewer easier. I will not give up them forever,
> after the main part of this patchset is committed, I will continue
> with them in a new thread. Thanks everyone for your input.
Sounds like a plan.

- Heikki

In response to

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey at 2020-12-05 15:10:28 from Andy Fan

Responses

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey at 2020-12-05 20:40:23 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2020-12-05 19:59:41	Re: should INSERT SELECT use a BulkInsertState?
Previous Message	Tom Lane	2020-12-05 18:03:43	Re: Change definitions of bitmap flags to bit-shifting style