From: | Maxim Ivanov <hi(at)yamlcoder(dot)me> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Use of additional index columns in rows filtering |
Date: | 2023-02-15 08:57:31 |
Message-ID: | N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi All,
I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve.
TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L
Take following database:
CREATE TABLE t(
a integer NOT NULL,
b integer NOT NULL,
d integer NOT NULL
);
CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b);
-- I'd expect index above to behave as index below for the purpose
-- of this query
-- CREATE UNIQUE INDEX ON t(a,b);
INSERT INTO t(
SELECT random() * 100000000 as a,
random() * 3 as b,
generate_series as d FROM generate_series(1,200000)
) ON CONFLICT DO NOTHING;
If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples from heap to check for condition `b = 4` despite both columns available in the index:
SELECT * FROM t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;
Here is the plan (notice high "shared hit"):
Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1)
Output: a, b, d
Buffers: shared hit=198307
-> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281 rows=0 loops=1)
Output: a, b, d
Index Cond: (t.a > 1000000)
Filter: (t.b = 4)
Rows Removed by Filter: 197805
Buffers: shared hit=198307
Planning:
Buffers: shared hit=30
Planning Time: 0.201 ms
Execution Time: 84.303 ms
And here is the plan with index on (a,b).
Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1)
Output: a, b, d
Buffers: shared hit=613
-> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0 loops=1)
Output: a, b, d
Index Cond: ((t.a > 1000000) AND (t.b = 4))
Buffers: shared hit=613
Planning:
Buffers: shared hit=41
Planning Time: 0.314 ms
Execution Time: 6.910 ms
Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and `(a,b)` behave exactly the same with this particular query.
Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap, but only for visibible tuples:
VACUUM FREEZE t;
SELECT a,b FROM t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;
Limit (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1)
Output: a, b
Buffers: shared hit=662
-> Index Only Scan using t_a_include_b on public.t (cost=0.42..6619.76 rows=1 width=8) (actual time=18.477..18.477 rows=0 loops=1)
Output: a, b
Index Cond: (t.a > 1000000)
Filter: (t.b = 4)
Rows Removed by Filter: 197771
Heap Fetches: 0
Buffers: shared hit=662
Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in the result.
To make query plan comparable I had to force index scan on both with:
SET enable_bitmapscan to off;
SET enable_seqscan to off;
SET max_parallel_workers_per_gather = 0;
Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L
Regards,
Maxim
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-02-15 09:06:49 | Re: [PATCH] Add pretty-printed XML output option |
Previous Message | Alvaro Herrera | 2023-02-15 08:32:01 | Re: Support logical replication of DDLs |