From: | Julien Rouhaud <julien(dot)rouhaud(at)free(dot)fr> |
---|---|
To: | Georgios Kokolatos <gkokolatos(at)protonmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Supporting = operator in gin/gist_trgm_ops |
Date: | 2020-11-13 09:50:09 |
Message-ID: | CAOBaU_YkkhakwTG4oA886T4CQsHG5hfY+xGA3dTBdZM+DTYJWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos
<gkokolatos(at)protonmail(dot)com> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: not tested
> Documentation: not tested
>
> Hi,
>
> this patch implements a useful and missing feature. Thank you.
>
> It includes documentation, which to a non-native speaker as myself seems appropriate.
> It includes comprehensive tests that cover the implemented cases.
>
> In the thread Alexander has pointed out, quote:
> "It would be more efficient to generate trigrams for equal operator
> using generate_trgm() instead of generate_wildcard_trgm()"
>
> I will echo the sentiment, though from a slightly different and possibly not
> as important point of view. The method used to extract trigrams from the query
> should match the method used to extract trigrams from the values when they
> get added to the index. This is gin_extract_value_trgm() and is indeed using
> generate_trgm().
>
> I have no opinion over Alexander's second comment regarding costing.
>
> I change the status to 'Waiting on Author', but please feel free to override
> my opinion if you feel I am wrong and reset it to 'Needs review'.
Thanks for the reminder Georgios! Thanks a lot Alexander for the review!
Indeed, I should have used generate_trgm() rather than
generate_wildcard_trgm(). IIUC, the rest of the code should still be
doing the same as [I]LikeStrategyNumber. I attach a v3 with that
modification.
For the costing, I tried this naive dataset:
CREATE TABLE t1 AS select md5(random()::text) AS val from
generate_series(1, 100000);
CREATE INDEX t1_btree ON t1 (val);
CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops);
Cost are like this (all default configuration, using any random existing entry):
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using t1_btree on t1 (cost=0.42..4.44 rows=1
width=33) (actual time=0.192..0.194 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Fetches: 0
Planning Time: 0.133 ms
Execution Time: 0.222 ms
(5 rows)
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Index Scan using t1_gist on t1 (cost=0.28..8.30 rows=1 width=33)
(actual time=0.542..2.359 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.189 ms
Execution Time: 2.382 ms
(4 rows)
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=400.01..404.02 rows=1 width=33) (actual
time=2.486..2.488 rows=1 loops=1)
Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on t1_gin (cost=0.00..400.01 rows=1 width=0)
(actual time=2.474..2.474 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.206 ms
Execution Time: 2.611 ms
So assuming that this dataset is representative enough, costing indeed
prefers a btree index over a gist/gin index, which should avoid
regression with this change.
Gin is however quite off, likely because it's a bitmap index scan
rather than an index scan, so gist is preferred in this scenario.
That's not ideal, but I'm not sure that there are many people having
both gin_trgm_ops and gist_trgm_ops.
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Handle-operator-in-pg_trgm.patch | application/octet-stream | 12.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2020-11-13 09:52:34 | Re: making update/delete of inheritance trees scale better |
Previous Message | Amit Kapila | 2020-11-13 09:48:58 | Re: logical streaming of xacts via test_decoding is broken |