From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Wildcard search support for pg_trgm |
Date: | 2010-12-11 21:07:32 |
Message-ID: | AANLkTi=2iL7_ZdQgnm2d1qiNC2rJM-faCVtFu2K5A8Lz@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hackers,
Here is first version of patch, which enable index support of wildcard
search in pg_trgm contrib module. The idea of the patch is to extract from
wildcard trigrams which should occurs in wildcard matching string. For
example, for '%sector%' wildcard such trigrams would be: 'sec', 'ect',
'tor'.
create table words (word text);
copy words from '/usr/share/dict/american-english';
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on words (cost=0.00..1703.11 rows=10 width=9) (actual
time=18.818..174.146 rows=7 loops=1)
Filter: (word ~~* '%independ%'::text)
Total runtime: 174.200 ms
(3 rows)
CREATE INDEX trgm_idx ON words USING gist (word gist_trgm_ops);
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=4.36..40.11 rows=10 width=9) (actual
time=2.445..2.529 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..4.35 rows=10 width=0)
(actual time=2.406..2.406 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.612 ms
(5 rows)
CREATE INDEX trgm_idx ON words USING gin (word gin_trgm_ops);
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=76.08..111.83 rows=10 width=9) (actual
time=2.675..2.755 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..76.07 rows=10 width=0)
(actual time=2.642..2.642 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.839 ms
(5 rows)
I've encountered with following problems:
1) Indexing support for ilike is possible only with case-insensetive
wildcards, e.g. when IGNORECASE macro is enabled. But I can't use this macro
in pg_trgm.sql.in, where list of operators is defined. Probably, is it
enuogh to put comment near IGNORECASE, which tells that if one disable this
macro he should also remove oparators from pg_trgm.sql.in?
2) I found gist index not very useful with default SIGLENINT = 3. I've
changed this value to 15 and I found gist index performs very good on
dictionary. But on longer strings greater values of SIGLENINT may be
required (probably even SIGLENINT > 122 will give benefit in some cases in
spite of TOAST).
----
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
trgm_wildcard-0.1.patch | text/x-patch | 13.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2010-12-11 21:08:32 | pg_execute_from_file, patch v10 |
Previous Message | Heikki Linnakangas | 2010-12-11 21:03:23 | Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records |