From: | Atsushi Ogawa <atsushi(dot)ogawa(at)gmail(dot)com> |
---|---|
To: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Boyer-More-Horspool searching LIKE queries |
Date: | 2022-01-11 13:55:16 |
Message-ID: | CAO2GH9Ekvcj19ihQyjk_S6RUjPHADhUfPQBU0gRMs4qVfk12Aw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I have created a patch to enable the Boyer-More-Horspool search
algorithm (B-M-H) for LIKE queries.
B-M-H needs to initialize the skip table and keep it during SQL execution.
In this patch, flinfo->fn_extra is used to keep the skip table.
The conditions under which B-M-H can be used are as follows.
(1) B-M-H in LIKE search supports only single-byte character sets and UTF8.
Multibyte character sets does not support, because it may contain another
characters in the byte sequence. For UTF-8, it works fine, because in
UTF-8 the byte sequence of one character cannot contain another character.
(2) The pattern string should be stable parameter, because B-M-H needs to
keep
the skip table generated from the pattern string during the execution of
the query.
(3) The pattern string should be at least 4 characters.
For example, '%AB%' can use B-M-H.
(4) The first and last character of the pattern string should be '%'.
(5) Characters other than the first and last of the pattern string
should not be '%', '_'. However, escaped characters such as
'\%', '\_' are available.
Also, this patch changes the collation validity check in functions
(such as textlike) to be performed at the first execution of the query,
instead of each function execution.
I have measured the performance with the following query.
---------- ---------- ---------- ---------- ---------- ---------- ----------
SET client_min_messages TO notice;
\timing
DO $$
DECLARE
cnt integer := 0;
total integer := 0;
BEGIN
FOR i IN 1..500 LOOP
select count(*) into cnt
from pg_catalog.pg_description d
where d.description like '%greater%';
total := total + cnt;
END LOOP;
RAISE NOTICE 'TOTAL: %', total;
END
$$
;
---------- ---------- ---------- ---------- ---------- ---------- ----------
Result
Without patch: 257.504ms
With patch: 191.638ms
Regards,
Atsushi Ogawa
Attachment | Content-Type | Size |
---|---|---|
like_bmh.patch | application/octet-stream | 10.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2022-01-11 14:28:43 | Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes |
Previous Message | Amit Kapila | 2022-01-11 13:28:19 | Re: [Ext:] Re: Stream Replication not working |