From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Allow ILIKE forward matching to use btree index |
Date: | 2025-01-15 22:34:52 |
Message-ID: | 5aa34780cda992afb7861316ec7b1563db8b9d67.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote:
> > > For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND
> > > t < '123fop')"
> > > and index scan can be used for this condition. On the other hand,
> > > "t ~~* '123foo'"
> > > cannot be converted and sequential scan is used.
> > >
> > > Even in this case, we can use a bitmap index scan for the
> > > condition
> > > "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')"
> > > followed by
> > > recheck by the original condition "t ~~* '123foo'", and this
> > > could be faster
> > > than seqscan.
In theory, there could be many OR clauses:
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
> > > The patch 0001 enables get_index_clause_from_support to receive
> > > OR clauses
> > > from support functions and use them to build bitmap index scan
> > > later. OR clauses
> > > returned by support functions are collected in the new argument
> > > 'orclauses" of
> > > match_restriction_clauses_to_index(), and they are passed to
> > > generate_bitmap_or_paths() later to build bitmap scan paths.
If I understand correctly, this
> > > The patch 0002 modifies the support function to return OR clauses
> > > as an example
> > > above when ILIKE's pattern has case-varying characters in forward
> > > matching. The
> > > OR clauses contains two index conditions for the upper and the
> > > lower letter of
> > > the first case-varying character in the pattern respectively.
> > > Although the
> > > subsequent characters are not considered in the index scans, it
> > > could enough be
> > > faster then sequent scan.
> > >
> > > Here is an example.
> > >
> > > 1. Create a table with random text records
> > >
> > > =# CREATE TABLE tbl (t text);
> > > =# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x
> > > END
> > > FROM (SELECT i, md5(i::text) x FROM
> > > generate_series(1,5000000) i);
> > >
> > > 2. Create an index
> > > =# CREATE INDEX ON tbl (t text_pattern_ops);
> > >
> > > 3. Before applying patch: Seq Scan was selected. It takes about 4
> > > sec.
> > >
> > > =# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
> >
> > I am sorry, the example in my previous main was wrong. I showed the
> > plan
> > with enable_index_scan = off. Indead, if the pattern starts with
> > case-insensitive characters like '12', an index scan can be used
> > even with
> > ILIKE.
> >
> > postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
> > QUERY
> > PLAN
> > -------------------------------------------------------------------
> > ------------------------------------------------------
> > Gather (cost=1000.00..69129.61 rows=500 width=33) (actual
> > time=36.317..4034.770 rows=1188 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > Buffers: shared hit=99 read=41939
> > -> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208
> > width=33) (actual time=19.908..4023.668 rows=396 loops=3)
> > Filter: (t ~~* 'abc%'::text)
> > Rows Removed by Filter: 1666271
> > Buffers: shared hit=99 read=41939
> > Planning Time: 0.726 ms
> > Execution Time: 4035.101 ms
> > (10 rows)
> >
> > 4. After applying patch: Bitmap Index Scan was selected.
> >
> > postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
> >
> > QUERY
> > PLAN
> >
> > -------------------------------------------------------------------
> > -------------------------------------------------------------------
> > ------------------------
> > Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500
> > width=33) (actual time=156.485..1266.789 rows=1188 loops=1)
> > Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t
> > ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND
> > (t ~~* 'abc%'::text)))
> > Filter: (t ~~* 'abc%'::text)
> > Rows Removed by Filter: 311473
> > Heap Blocks: exact=42010
> > Buffers: shared hit=1 read=44600
> > -> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0)
> > (actual time=136.979..136.980 rows=0 loops=1)
> > Buffers: shared hit=1 read=2590
> > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71
> > rows=148515 width=0) (actual time=80.548..80.549 rows=158375
> > loops=1)
> > Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~
> > 'B'::text))
> > Buffers: shared read=1272
> > -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71
> > rows=148515 width=0) (actual time=56.427..56.427 rows=157042
> > loops=1)
> > Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~
> > 'b'::text))
> > Buffers: shared hit=1 read=1318
> > Planning Time: 0.592 ms
> > Execution Time: 1267.162 ms
> > (16 rows)
> >
> >
> > > What do you think about it?
> > >
> > > I think another application of OR-clause returning support
> > > function might be
> > > allowing to use an index for NOT LIKE (!~~) expression because,
> > > for example,
> > > "t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >=
> > > '123fooz')".
> > > (The second condition is a bit loose but this would be safe and
> > > not problem
> > > since tuples are filtered by the original condition after the
> > > index scan.)
> > > However, it would not very useful unless the distribution is much
> > > skew so that
> > > NOT LIKE expression's selectivity is enough small.
>
> I added a new patch 0003 that enables ILIKE forward matching to use a
> SP-Gist index.
> Similar to 0002, this generates BitmapOr Index Scan for two index
> clauses that use
> "^@" operator for upper letter and lower letter pattern respectively.
>
> * Before applying the patch:
>
> postgres=# explain analyze select* from tbl where t ilike 'abc%';
> QUERY
> PLAN
> ---------------------------------------------------------------------
> -------------------------------------------------
> Gather (cost=1000.00..18899.52 rows=101 width=33) (actual
> time=41.799..930.382 rows=253 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> Buffers: shared hit=96 read=12533
> -> Parallel Seq Scan on tbl (cost=0.00..17889.42 rows=42
> width=33) (actual time=26.041..917.570 rows=84 loops=3)
> Filter: (t ~~* 'abc%'::text)
> Rows Removed by Filter: 336582
> Buffers: shared hit=96 read=12533
> Planning Time: 0.690 ms
> Execution Time: 930.545 ms
> (10 rows)
>
>
> * After applying the patch:
>
> postgres=# explain analyze select* from tbl where t ilike 'abc%';
> QUERY
> PLAN
> ---------------------------------------------------------------------
> ----------------------------------------------------------------
> Bitmap Heap Scan on tbl (cost=3307.96..16702.11 rows=101 width=33)
> (actual time=69.449..215.636 rows=253 loops=1)
> Recheck Cond: (((t ^@ 'A'::text) AND (t ~~* 'abc%'::text)) OR ((t
> ^@ 'a'::text) AND (t ~~* 'abc%'::text)))
> Filter: (t ~~* 'abc%'::text)
> Rows Removed by Filter: 62992
> Heap Blocks: exact=12437
> Buffers: shared hit=18529
> -> BitmapOr (cost=3307.96..3307.96 rows=61212 width=0) (actual
> time=62.567..62.568 rows=0 loops=1)
> Buffers: shared hit=6092
> -> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96
> rows=30606 width=0) (actual time=41.893..41.893 rows=31536 loops=1)
> Index Cond: (t ^@ 'A'::text)
> Buffers: shared hit=2461
> -> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96
> rows=30606 width=0) (actual time=20.671..20.671 rows=31709 loops=1)
> Index Cond: (t ^@ 'a'::text)
> Buffers: shared hit=3631
> Planning Time: 1.414 ms
> Execution Time: 215.903 ms
> (16 rows)
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-01-15 22:40:19 | Re: Allow ILIKE forward matching to use btree index |
Previous Message | Peter Eisentraut | 2025-01-15 22:20:58 | Re: Use Python "Limited API" in PL/Python |