From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Allow ILIKE forward matching to use btree index |
Date: | 2024-12-24 07:04:42 |
Message-ID: | 20241224160442.5852d16fcc64cdfc8238c325@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 20 Dec 2024 03:22:26 +0900
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> Hi,
>
> Currently, btree indexes cannot used for ILIKE (~~*) operator if the pattern
> has case-varying characters although LIKE (~~) expression can be converted
> to indexable clauses by the planner support function (if the collation
> is "C" or operator class 'text_pattern_ops' is used).
>
> For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND t < '123fop')"
> and index scan can be used for this condition. On the other hand, "t ~~* '123foo'"
> cannot be converted and sequential scan is used.
>
> Even in this case, we can use a bitmap index scan for the condition
> "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" followed by
> recheck by the original condition "t ~~* '123foo'", and this could be faster
> than seqscan.
>
> However, even if the support function would return OR clauses, the current
> planner implementation cannot not build bitmap scan paths using them.
>
>
> The attached patches allow to ILIKE (~~*) forward matching to use btree index.
>
> The patch 0001 enables get_index_clause_from_support to receive OR clauses
> from support functions and use them to build bitmap index scan later. OR clauses
> returned by support functions are collected in the new argument 'orclauses" of
> match_restriction_clauses_to_index(), and they are passed to
> generate_bitmap_or_paths() later to build bitmap scan paths.
>
> The patch 0002 modifies the support function to return OR clauses as an example
> above when ILIKE's pattern has case-varying characters in forward matching. The
> OR clauses contains two index conditions for the upper and the lower letter of
> the first case-varying character in the pattern respectively. Although the
> subsequent characters are not considered in the index scans, it could enough be
> faster then sequent scan.
>
> Here is an example.
>
> 1. Create a table with random text records
>
> =# CREATE TABLE tbl (t text);
> =# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x END
> FROM (SELECT i, md5(i::text) x FROM generate_series(1,5000000) i);
>
> 2. Create an index
> =# CREATE INDEX ON tbl (t text_pattern_ops);
>
> 3. Before applying patch: Seq Scan was selected. It takes about 4 sec.
>
> =# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
I am sorry, the example in my previous main was wrong. I showed the plan
with enable_index_scan = off. Indead, if the pattern starts with
case-insensitive characters like '12', an index scan can be used even with
ILIKE.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..69129.61 rows=500 width=33) (actual time=36.317..4034.770 rows=1188 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=99 read=41939
-> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208 width=33) (actual time=19.908..4023.668 rows=396 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 1666271
Buffers: shared hit=99 read=41939
Planning Time: 0.726 ms
Execution Time: 4035.101 ms
(10 rows)
4. After applying patch: Bitmap Index Scan was selected.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500 width=33) (actual time=156.485..1266.789 rows=1188 loops=1)
Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND (t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 311473
Heap Blocks: exact=42010
Buffers: shared hit=1 read=44600
-> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0) (actual time=136.979..136.980 rows=0 loops=1)
Buffers: shared hit=1 read=2590
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=80.548..80.549 rows=158375 loops=1)
Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text))
Buffers: shared read=1272
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=56.427..56.427 rows=157042 loops=1)
Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text))
Buffers: shared hit=1 read=1318
Planning Time: 0.592 ms
Execution Time: 1267.162 ms
(16 rows)
> What do you think about it?
>
> I think another application of OR-clause returning support function might be
> allowing to use an index for NOT LIKE (!~~) expression because, for example,
> "t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= '123fooz')".
> (The second condition is a bit loose but this would be safe and not problem
> since tuples are filtered by the original condition after the index scan.)
> However, it would not very useful unless the distribution is much skew so that
> NOT LIKE expression's selectivity is enough small.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-12-24 07:33:47 | Re: date_trunc invalid units with infinite value |
Previous Message | Hayato Kuroda (Fujitsu) | 2024-12-24 06:26:48 | RE: Parallel heap vacuum |