From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Use unique index for longer pathkeys. |
Date: | 2014-07-14 05:31:52 |
Message-ID: | CAA4eK1+6b6Wjwf51oZMrL+mKFH8xUp9J-pEhQvoR8SE7sWyTWw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello, This is the continuation from the last CF.
>
> This patch intends to make PG to use index for longer pathkeys
> than index columns when,
>
> - The index is a unique index.
> - All index columns are NOT NULL.
> - The index column list is a subset of query_pathkeys.
>
> ====
>
> This patch consists of mainly three parts.
>
> 1. Mark index as 'Uniquely orders the table' if the conditions
> are satisfied. This is the same from the previous patch.
>
> 2. Collect the "common primary pathkeys", which is pathkeys that
> can perfectly sort all involved
> RTE's. (collect_common_primary_pathkeys())
>
> 3. Trim the query pathkeys using the common primary pathkeys.
> (trim_query_pathkeys())
I have spent some time on this patch and would like to share my
findings as below with you.
1.
It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
this patch is not getting set properly. Please refer below test:
create table t (a int, b int, c int, d text);
create unique index i_t_pkey on t(a, b);
insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
100000) a);
analyze;
explain (costs off, analyze on) select distinct * from t;
Unptached -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------------
Unique (actual time=331.624..597.278 rows=100001 loops=1)
-> Sort (actual time=330.889..430.449 rows=100001 loops=1)
Sort Key: a, b, c, d
Sort Method: external merge Disk: 2344kB
-> Seq Scan on t (actual time=0.013..74.202 rows=100001 loops=1)
Execution time: 665.332 ms
(6 rows)
Patch (pathkey_and_uniqueindx_typ2_v1) -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------------
Unique (actual time=336.033..601.144 rows=100001 loops=1)
-> Sort (actual time=336.027..436.621 rows=100001 loops=1)
Sort Key: a, b, c, d
Sort Method: external merge Disk: 2344kB
-> Seq Scan on t (actual time=0.014..78.826 rows=100001 loops=1)
Execution time: 667.387 ms
(6 rows)
Shouldn't above query use index scan after patch?
2.
typedef struct IndexOptInfo
bool predOK; /* true if predicate matches query */
bool unique; /* true if a unique index */
bool immediate; /* is uniqueness enforced immediately? */
+ bool full_ordered; /* don't key columns duplicate? */
It seems you have forgotten to update out function _outIndexOptInfo().
3.
+ root->distinct_pathkeys =
+ shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
+ true);
+
+ root->query_pathkeys =
+ shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
+ true);
Add a space after each parameter in function call.
4.
+PathKeysComparison
+compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
a. This function return 4 different type of values (PATHKEYS_EQUAL,
PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
however the caller just checks for PATHKEYS_EQUAL, isn't it better to
just
return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.
b. Another idea could be that instead of writting separate function,
pass an additional parameter to compare_pathkeys() to distinguish
for ignore strategy case.
c. In any case, I think it is good to add comments on top of newly
introduced function (compare_pathkeys_ignoring_strategy) or extend
the comments of compare_pathkeys() if you want to add a new parameter
to it.
> Finally, the new patch has grown up to twice in size.. Although
> it seems a bit large, I think that side-effects are clearly
> avoided.
I am bit worried about the extra cycles added by this patch as compare
to your previous patch; example
During trimming of paths, it will build index paths (build_index_pathkeys())
which it will anyway do again during creation of index paths:
create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()
I am not sure if this has direct impact, however I am suspecting trimming
logic can add quite a few instructions even for cases when it is not
beneficial.
At this moment, I also don't have better idea, so lets proceed with review
of this
patch unless there is any other better way to achieve it.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Moench-Tegeder | 2014-07-14 05:33:35 | Re: 9.4 documentation: duplicate paragraph in logical decoding example |
Previous Message | Stephen Frost | 2014-07-14 04:29:09 | Re: tweaking NTUP_PER_BUCKET |