From: | Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: Avoid full GIN index scan when possible |
Date: | 2019-08-03 01:51:16 |
Message-ID: | a6e1564e-98c8-e420-cfaf-0309316c4910@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Attached 6th version of the patches.
On 01.08.2019 22:28, Tom Lane wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
>> On Thu, Aug 1, 2019 at 9:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> While I've not attempted to fix that here, I wonder whether we shouldn't
>>> fix it by just forcing forcedRecheck to true in any case where we discard
>>> an ALL qualifier.
>>>
>> +1 for setting forcedRecheck in any case we discard ALL qualifier.
>> ISTM, real life number of cases we can skip recheck here is
>> negligible. And it doesn't justify complexity.
> Yeah, that was pretty much what I was thinking --- by the time we got
> it fully right considering nulls and multicolumn indexes, the cases
> where not rechecking could actually do something useful would be
> pretty narrow. And a bitmap heap scan is always going to have to
> visit the heap, IIRC, so how much could skipping the recheck really
> save?
I have simplified patch #1 setting forcedRecheck for all discarded ALL quals.
(This solution is very close to the earliest unpublished version of the patch.)
More accurate recheck-forcing logic was moved into patch #2 (multicolumn
indexes were fixed). This patch also contains ginFillScanKey() refactoring
and new internal mode GIN_SEARCH_MODE_NOT_NULL that is used only for
GinScanKey.xxxConsistentFn initialization and transformed into
GIN_SEARCH_MODE_ALL before GinScanEntry initialization.
The cost estimation seems to be correct for both patch #1 and patch #2 and
left untouched since v05.
>>> BTW, it's not particularly the fault of this patch, but: what does it
>>> even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?
>>>
>> It might mean we would like to see all the results, which don't
>> contain given key.
> Ah, right, I forgot that the consistent-fn might look at the match
> results.
Also I decided to go further and tried to optimize (patch #3) the case for
GIN_SEARCH_MODE_ALL with a nonzero number of keys.
Full GIN scan can be avoided in queries like this contrib/intarray query:
"arr @@ '1' AND arr @@ '!2'" (search arrays containing 1 and not containing 2).
Here we have two keys:
- key '1' with GIN_SEARCH_MODE_DEFAULT
- key '2' with GIN_SEARCH_MODE_ALL
Key '2' requires full scan that can be avoided with the forced recheck.
This query is equivalent to single-qual query "a @@ '1 & !2'" which
emits only one GIN key '1' with recheck.
Below is example for contrib/intarray operator @@:
=# CREATE EXTENSION intarray;
=# CREATE TABLE t (a int[]);
=# INSERT INTO t SELECT ARRAY[i] FROM generate_series(1, 1000000) i;
=# CREATE INDEX ON t USING gin (a gin__int_ops);
=# SET enable_seqscan = OFF;
-- master
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=16000095.45..16007168.16 rows=5019 width=24) (actual time=66.955..66.956 rows=1 loops=1)
Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Heap Blocks: exact=1
Buffers: shared hit=6816
-> Bitmap Index Scan on t_a_idx (cost=0.00..16000094.19 rows=5019 width=0) (actual time=66.950..66.950 rows=1 loops=1)
Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Buffers: shared hit=6815
Planning Time: 0.086 ms
Execution Time: 67.076 ms
(9 rows)
-- equivalent single-qual query
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1 & !2';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=78.94..7141.57 rows=5025 width=24) (actual time=0.019..0.019 rows=1 loops=1)
Recheck Cond: (a @@ '1 & !2'::query_int)
Heap Blocks: exact=1
Buffers: shared hit=8
-> Bitmap Index Scan on t_a_idx (cost=0.00..77.68 rows=5025 width=0) (actual time=0.014..0.014 rows=1 loops=1)
Index Cond: (a @@ '1 & !2'::query_int)
Buffers: shared hit=7
Planning Time: 0.056 ms
Execution Time: 0.039 ms
(9 rows)
-- with patch #3
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=75.45..7148.16 rows=5019 width=24) (actual time=0.019..0.020 rows=1 loops=1)
Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Heap Blocks: exact=1
Buffers: shared hit=6
-> Bitmap Index Scan on t_a_idx (cost=0.00..74.19 rows=5019 width=0) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Buffers: shared hit=5
Planning Time: 0.059 ms
Execution Time: 0.040 ms
(9 rows)
Patch #3 again contains a similar ugly solution -- we have to remove already
initialized GinScanKeys with theirs GinScanEntries. GinScanEntries can be
shared, so the reference counting was added.
Also modifications of cost estimation in patch #3 are questionable.
GinQualCounts are simply not incremented when haveFullScan flag is set,
because the counters anyway will be overwritten by the caller.
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v06.patch | text/x-patch | 11.2 KB |
0002-Force-GIN-recheck-more-accurately-v06.patch | text/x-patch | 14.4 KB |
0003-Avoid-GIN-full-scan-for-non-empty-ALL-keys-v06.patch | text/x-patch | 8.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2019-08-03 04:05:36 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Andres Freund | 2019-08-03 01:08:02 | Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions |