From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: index-only scans with partial indexes |
Date: | 2015-09-13 18:03:06 |
Message-ID: | 1147640721.1321636.1442167386054.JavaMail.yahoo@mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> That means I've been unable to measure any significant overhead
> of the patch.
I've run a lot of benchmarks, and with anything resembling a common
query the differences in planning time are lost in the noise. (I
didn't create a better example than Tomas of where a lot of indexes
cause a minimal increase in planning time.)
The test environment is a "bare iron" machine with:
1 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (4 cores; 8 HW threads)
16GB DDR3 RAM
2 1TB drives in RAID 1
ubuntu 14.04 LTS 64-bit
various checkouts from master, most recently a7212a99
no cassert, default optimizations
As one example, to get a heap bigger than RAM I set up like this:
drop table if exists t;
create table t (a int not null, b int not null, x text not null);
insert into t
select i, i, repeat(md5(i::text), 50)
from generate_series(1,10000000) s(i);
vacuum freeze t;
checkpoint;
I ran one-index tests like this:
create index t_b_partial on t(b) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;
... then two-index tests like this:
create index t_b_a_partial on t(b, a) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;
All queries were run 5 times and (to minimize stray slowdowns from
other sources on this desktop machine) I took the minimum plan time
and minimum execution time. (My browser and other optional
processes were stopped to also minimize noise, but the results
still had more noise than I would prefer.)
master - single index
---------------------
Planning time: 0.078 ms
Execution time: 0.544 ms
Planning time: 0.079 ms
Execution time: 0.533 ms
Planning time: 0.066 ms
Execution time: 0.491 ms
master - both indexes
---------------------
Planning time: 0.080 ms
Execution time: 0.396 ms
Planning time: 0.076 ms
Execution time: 0.373 ms
Planning time: 0.056 ms
Execution time: 0.275 ms
patched - single index
----------------------
Planning time: 0.032 ms
Execution time: 0.136 ms
Planning time: 0.079 ms
Execution time: 0.537 ms
Planning time: 0.050 ms
Execution time: 0.213 ms
patched - both indexes
----------------------
Planning time: 0.100 ms
Execution time: 0.373 ms
Planning time: 0.067 ms
Execution time: 0.251 ms
Planning time: 0.065 ms
Execution time: 0.240 ms
In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one. If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster. Can
we wrangle this patch into making a better choice among available
index-only scans?
It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.
Benchmarking took so much time I did not get to a close review of
the code changes. :-( Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside. I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.
I am changing this to Waiting on Author.
I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted. If it's still open I'll have a look when I get back.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Seltenreich | 2015-09-13 18:32:30 | Re: 9.3.9 and pg_multixact corruption |
Previous Message | Ildus Kurbangaliev | 2015-09-13 16:43:01 | Re: [PATCH] Refactoring of LWLock tranches |