From: | Jan Behrens <jbe(at)public-software-group(dot)org> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | GiST index on INT8, possible bug in query planner? |
Date: | 2018-12-03 16:09:04 |
Message-ID: | 20181203170904.3bebd803bd71f4a24a3e1061@public-software-group.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dear colleagues,
I have developed two indices using PostgreSQL's awesome GiST support,
one of them available here:
http://www.public-software-group.org/pgLatLon
(which is a lightweight and MIT-licensed alternative to PostGIS for
certain simple tasks involving geographic coordinates on the WGS-84
spheroid)
Recently I had the requirement of creating a multi-column index on
an integer in the first column and a custom data type in the second
column of the index. Since integers are not supported by GiST indices
by default, I used the btree_gist extension by Teodor Sigaev,
Oleg Bartunov, Janko Richter, and Paul Jungwirth, see:
https://www.postgresql.org/docs/10/btree-gist.html
However, the GiST index seems not to work as expected by me when
64-bit integers are involved. I tried to create a minimal
proof-of-concept to demonstrate this. Consider the following setup:
CREATE EXTENSION btree_gist;
CREATE TABLE test4_btree (id SERIAL4, ctx INT4);
CREATE TABLE test8_btree (id SERIAL4, ctx INT8);
CREATE TABLE test4_gist (id SERIAL4, ctx INT4);
CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
I create multi-column indices on all four tables, with "ctx" as primary
and "id" as secondary column:
CREATE INDEX ON test4_btree (ctx, id);
CREATE INDEX ON test8_btree (ctx, id);
CREATE INDEX ON test4_gist USING gist (ctx, id);
CREATE INDEX ON test8_gist USING gist (ctx, id);
Now we add some data:
INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
Only the tables directly using the B-tree index ("test4_btree" and
"test8_btree") and the table where "ctx" is 32-bit wide seem to work
properly:
EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))
EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))
EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))
EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: (id = 2)
The query planning for the select on table "test8_gist" does not
include "ctx" in the "Index Cond".
To verify that the above problem isn't just an optimization because of
a low row count, I created a larger example with different values:
CREATE EXTENSION btree_gist;
CREATE TABLE test4_btree (ctx INT4, src INT4);
CREATE TABLE test8_btree (ctx INT8, src INT4);
CREATE TABLE test4_gist (ctx INT4, src INT4);
CREATE TABLE test8_gist (ctx INT8, src INT4);
CREATE INDEX ON test4_btree (ctx, src);
CREATE INDEX ON test8_btree (ctx, src);
CREATE INDEX ON test4_gist USING gist (ctx, src);
CREATE INDEX ON test8_gist USING gist (ctx, src);
INSERT INTO test4_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test8_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test4_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test8_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: (src = 2)
ANALYZE;
EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))
EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2))
SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2; -- fast
SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2; -- fast
SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2; -- fast
SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2; -- slow!
The query on "test8_gist" is significantly slower than in all other
three cases.
I wonder if this is a bug in the query planner, in the GiST facilities
of PostgreSQL, a problem of the "btree_gist" extension, or something
else? Can anyone help me?
Kind regards,
Jan Behrens
--
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany
www.public-software-group.org
vorstand(at)public-software-group(dot)org
eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B
Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek
From | Date | Subject | |
---|---|---|---|
Next Message | Igor Korot | 2018-12-03 16:16:13 | Re: How to watch for schema changes |
Previous Message | Adrian Klaver | 2018-12-03 15:17:31 | Re: How to watch for schema changes |