Re: Secondary index access optimizations

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Secondary index access optimizations
Date: 2017-08-14 16:33:03
Message-ID: 73cfa456-c313-6dc0-825f-093b52b85d29@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.08.2017 12:37, Konstantin Knizhnik wrote:
> Hi hackers,
>
> I am trying to compare different ways of optimizing work with huge
> append-only tables in PostgreSQL where primary key is something like
> timestamp and queries are usually accessing most recent data using
> some secondary keys. Size of secondary index is one of the most
> critical factors limiting insert/search performance. As far as data
> is inserted in timestamp ascending order, access to primary key is
> well localized and accessed tables are present in memory. But if we
> create secondary key for the whole table, then access to it will
> require random reads from the disk and significantly decrease
> performance.
>
> There are two well known solutions of the problem:
> 1. Table partitioning
> 2. Partial indexes
>
> This approaches I want to compare. First of all I want to check if
> optimizer is able to generate efficient query execution plan covering
> different time intervals.
> Unfortunately in both cases generated plan is not optimal.
>
> 1. Table partitioning:
>
> create table base (k integer primary key, v integer);
> create table part1 (check (k between 1 and 10000)) inherits (base);
> create table part2 (check (k between 10001 and 20000)) inherits (base);
> create index pi1 on part1(v);
> create index pi2 on part2(v);
> insert int part1 values (generate series(1,10000), random());
> insert into part2 values (generate_series(10001,20000), random());
> explain select * from base where k between 1 and 20000 and v = 100;
> QUERY PLAN
> -----------------------------------------------------------------------
> Append (cost=0.00..15.65 rows=3 width=8)
> -> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
> Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
> -> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8)
> Index Cond: (v = 100)
> Filter: ((k >= 1) AND (k <= 20000))
> -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
> Index Cond: (v = 100)
> Filter: ((k >= 1) AND (k <= 20000))
>
> Questions:
> - Is there some way to avoid sequential scan of parent table? Yes, it
> is empty and so sequential scan will not take much time, but ... it
> still requires some additional actions and so increasing query
> execution time.
> - Why index scan of partition indexes includes filter condition if it
> is guaranteed by check constraint that all records of this partition
> match search predicate?
>
>
> 2. Partial indexes:
>
> create table t (k integer primary key, v integer);
> insert into t values (generate_series(1,20000),random());
> create index i1 on t(v) where k between 1 and 10000;
> create index i2 on t(v) where k between 10001 and 20000;
> postgres=# explain select * from t where k between 1 and 10000 and v =
> 100;
> QUERY PLAN
> ------------------------------------------------------------
> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
> Index Cond: (v = 100)
> (2 rows)
>
>
> Here we get perfect plan. Let's try to extend search interval:
>
>
> postgres=# explain select * from t where k between 1 and 20000 and v =
> 100;
> QUERY PLAN
> ------------------------------------------------------------------
> Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8)
> Index Cond: ((k >= 1) AND (k <= 20000))
> Filter: (v = 100)
> (3 rows)
>
> Unfortunately in this case Postgres is not able to apply partial indexes.
> And this is what I expected to get:
>
> postgres=# explain select * from t where k between 1 and 10000 and v =
> 100 union all select * from t where k between 10001 and 20000 and v =
> 100;
> QUERY PLAN
> ----------------------------------------------------------------------
> Append (cost=0.29..14.58 rows=2 width=8)
> -> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
> Index Cond: (v = 100)
> -> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8)
> Index Cond: (v = 100)
>
>
> I wonder if there are some principle problems in supporting this two
> things in optimizer:
> 1. Remove search condition for primary key if it is fully satisfied by
> derived table check constraint.
> 2. Append index scans of several partial indexes if specified interval
> is covered by their conditions.
>
> I wonder if someone is familiar with this part of optimizer ad can
> easily fix it.
> Otherwise I am going to spend some time on solving this problems (if
> community think that such optimizations will be useful).
>

Replying to myself: the following small patch removes redundant checks
from index scans for derived tables:

diff --git a/src/backend/optimizer/util/plancat.c
b/src/backend/optimizer/util/plancat.c
index 939045d..1f7c9cf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,
if (predicate_refuted_by(safe_constraints,
rel->baserestrictinfo, false))
return true;

+ /*
+ * Remove from restriction list items implied by table constraints
+ */
+ safe_restrictions = NULL;
+ foreach(lc, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ if (!predicate_implied_by(list_make1(rinfo->clause),
safe_constraints, false)) {
+ safe_restrictions = lappend(safe_restrictions,
rinfo);
+ }
+ }
+ rel->baserestrictinfo = safe_restrictions;
+
+
return false;
}

=================================================

I am not sure if this is the right place of adjusting restriction list
and is uch transformation always correct.
But for the queries I have tested produced plans are correct:

postgres=# explain select * from base where k between 1 and 20000 and v
= 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
(7 rows)

postgres=# explain select * from base where k between 1 and 15000 and v
= 100;
QUERY PLAN
-----------------------------------------------------------------------
Append (cost=0.00..15.64 rows=3 width=8)
-> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
Filter: ((k >= 1) AND (k <= 15000) AND (v = 100))
-> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
Index Cond: (v = 100)
-> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
Index Cond: (v = 100)
Filter: (k <= 15000)
(8 rows)

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-08-14 16:33:48 Re: Orphaned files in base/[oid]
Previous Message Chris Travers 2017-08-14 16:28:48 Re: Orphaned files in base/[oid]