Quick Links

Re: Use unique index for longer pathkeys.

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	amit(dot)kapila16(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Use unique index for longer pathkeys.
Date:	2014-07-14 10:32:40
Message-ID:	20140714.193240.181822165.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for reviewing, the revised patch will come later.

At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+6b6Wjwf51oZMrL+mKFH8xUp9J-pEhQvoR8SE7sWyTWw(at)mail(dot)gmail(dot)com>
> On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > Hello, This is the continuation from the last CF.
> >
> > This patch intends to make PG to use index for longer pathkeys
> > than index columns when,
> >
> > - The index is a unique index.
> > - All index columns are NOT NULL.
> > - The index column list is a subset of query_pathkeys.
> >
> > ====
> >
> > This patch consists of mainly three parts.
(snip)
>
> I have spent some time on this patch and would like to share my
> findings as below with you.
>
> 1.
> It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
> this patch is not getting set properly. Please refer below test:

Yeah, probably you omitted the second condition above.

> create table t (a int, b int, c int, d text);

The following definition instead will make this work. NULLs
cannot be unique.

| create table t (a int not null, b int not null, c int, d text);

> create unique index i_t_pkey on t(a, b);
> insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
> 100000) a);
> analyze;
> explain (costs off, analyze on) select distinct * from t;
...
> 2.
> typedef struct IndexOptInfo
> bool predOK; /* true if predicate matches query */
> bool unique; /* true if a unique index */
> bool immediate; /* is uniqueness enforced immediately? */
> + bool full_ordered; /* don't key columns duplicate? */
>
> It seems you have forgotten to update out function _outIndexOptInfo().

Mmm, it's surely what I didn't intended to make:( , but the
member actually works in collect_common_primary_pathkeys.

The correct (desirable) code should use (allnotnull && unique)
instead of (full_ordered). I'll fix this in the comming version
later but they are equivelant in functionality. It's not a
mistake of forgetting update out (so broken), but branching from
wrong point (so it works contrary to the expectation):)

> 3.
> + root->distinct_pathkeys =
> + shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
> + true);
> +
> + root->query_pathkeys =
> + shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
> + true);
>
> Add a space after each parameter in function call.

Thank you for pointing out.

> 4.
> +PathKeysComparison
> +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
>
> a. This function return 4 different type of values (PATHKEYS_EQUAL,
> PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
> however the caller just checks for PATHKEYS_EQUAL, isn't it better to
> just
> return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

Hmm.. I agree that it is practically removable, but I think it is
necessary from the view of similarity to the function with the
similar name. Perhaps I want another name which don't suggest the
similarity to compare_pathekeys(), say, bool
pathkeys_are_equal_ignoring_strategy() to change the rerurn
values set.

> b. Another idea could be that instead of writting separate function,
> pass an additional parameter to compare_pathkeys() to distinguish
> for ignore strategy case.

It sounds reasonable. According to my faint memory, the reason
why the function is separated from the original one is, perhaps,
simplly a editorial reason.

> c. In any case, I think it is good to add comments on top of newly
> introduced function (compare_pathkeys_ignoring_strategy) or extend
> the comments of compare_pathkeys() if you want to add a new parameter
> to it.

Ouch, thank you for pointing out. I'll add comment if it remains
in the next patch.

> > Finally, the new patch has grown up to twice in size.. Although
> > it seems a bit large, I think that side-effects are clearly
> > avoided.
>
> I am bit worried about the extra cycles added by this patch as compare
> to your previous patch; example
> During trimming of paths, it will build index paths (build_index_pathkeys())
> which it will anyway do again during creation of index paths:
> create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I felt the same thing. On stupid measure to reduce cycles would
be caching created pathkeys in IndexOptInfo. I'll try in the next
patch.

> I am not sure if this has direct impact, however I am suspecting trimming
> logic can add quite a few instructions even for cases when it is not
> beneficial.
> At this moment, I also don't have better idea, so lets proceed with review
> of this
> patch unless there is any other better way to achieve it.

I suppose there's some shortcut which can exclude the cases where
obviously there's no use of pathkeys trimming, for example, using
only pathkey lengths. I'll seek for it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Use unique index for longer pathkeys. at 2014-07-14 05:31:52 from Amit Kapila

Responses

Re: Use unique index for longer pathkeys. at 2014-07-14 14:51:17 from Amit Kapila
Re: Use unique index for longer pathkeys. at 2014-07-15 08:47:48 from Kyotaro HORIGUCHI

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2014-07-14 10:46:43	Re: Incorrect comment in postgres_fdw.c
Previous Message	Shigeru Hanada	2014-07-14 10:31:58	Re: Incorrect comment in postgres_fdw.c