From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | digoal(at)126(dot)com |
Subject: | BUG #15014: pg_trgm regexp with wchar not good? |
Date: | 2018-01-18 13:03:46 |
Message-ID: | 20180118130346.3630.56019@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 15014
Logged by: Zhou Digoal
Email address: digoal(at)126(dot)com
PostgreSQL version: 10.1
Operating system: CentOS 7.x.x64
Description:
HI,
when i use pg_trgm's gin index, with wchar search, it's not good for regexp,
but good for like express.
```
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+----------+----------+------------+------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
template0 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres
+
create or replace function gen_hanzi(int) returns text as $$
declare
res text;
begin
if $1 >=1 then
select string_agg(chr(19968+(random()*20901)::int), '') into res from
generate_series(1,$1);
return res;
end if;
return null;
end;
$$ language plpgsql strict;
postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test select generate_series(1,100000),
gen_hanzi(100);
INSERT 0 100000
postgres=# create index idx_test_1 on test using gin (info gin_trgm_ops);
CREATE INDEX
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '婐绷乂畳';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=45261409.28..45261421.30 rows=10
width=36) (actual time=583.810..816.503 rows=1 loops=1)
Output: id, info
Recheck Cond: (test.info ~ '婐绷乂畳'::text)
Rows Removed by Index Recheck: 99999
Heap Blocks: exact=4167
Buffers: shared hit=59783
-> Bitmap Index Scan on idx_test_1 (cost=0.00..45261409.28 rows=10
width=0) (actual time=583.237..583.237 rows=100000 loops=1)
Index Cond: (test.info ~ '婐绷乂畳'::text)
Buffers: shared hit=55616
Planning time: 0.150 ms
Execution time: 816.545 ms
(11 rows)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '123';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=39.40..2897.60 rows=4000 width=36)
(actual time=0.046..0.046 rows=0 loops=1)
Output: id, info
Recheck Cond: (test.info ~ '123'::text)
Buffers: shared hit=4
-> Bitmap Index Scan on idx_test_1 (cost=0.00..38.40 rows=4000 width=0)
(actual time=0.043..0.043 rows=0 loops=1)
Index Cond: (test.info ~ '123'::text)
Buffers: shared hit=4
Planning time: 0.146 ms
Execution time: 0.072 ms
(9 rows)
```
best regards,
digoal
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-01-18 15:15:53 | Re: BUG #15014: pg_trgm regexp with wchar not good? |
Previous Message | Kyotaro HORIGUCHI | 2018-01-18 12:14:49 | Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column. |