Quick Links

Re: Row pattern recognition

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Row pattern recognition
Date:	2024-12-30 23:57:07
Message-ID:	20241231.085707.2149067544950521144.ishii@postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> I have added further optimization to the v25 patch.
>
> While generating possible input strings that may satisfy the pattern
> string, it is possible to omit to run regexp in some cases. Since
> regexp matching is heavy operation, especially if it is applied to
> long string, it is beneficial for RPR to reduce the number of regexp
> runs.
>
> If the tail pattern variable has '+' quantifier and previously the
> input string was confirmed to be matched the pattern string, and the
> same character as the tail pattern string is added, we don't need run
> regexp match again because the new input string surely matches the
> pattern string. Suppose a pattern string is "ab+" and the current
> input string is "ab" (this satisfies "ab+"). If the new input string
> is "abb", then "abb" surely matches "ab+" too and we don't need to run
> regexp again.
>
> In v26 patch, by using the technique above I get performance
> improvement.
>
>>> EXPLAIN (ANALYZE)
>>> SELECT aid, bid, count(*) OVER w
>>> FROM pgbench_accounts WHERE aid <= 10000
>>> WINDOW w AS (
>>> PARTITION BY bid
>>> ORDER BY aid
>>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>> AFTER MATCH SKIP PAST LAST ROW
>>> INITIAL
>>> PATTERN (START UP+)
>>> DEFINE
>>> START AS TRUE,
>>> UP AS aid > PREV(aid)
>>> );
>
> This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
> it takes 41.84 ms, which is over 7 times improvement. Also I run the
> SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
> 1195.603 ms, which is over 21 times improvement.
>
> I think a pattern string ended up with '+' is one of common use cases
> of RPR, and I believe the improvement is useful for many RPR
> applications.
>
> I also add following changes to v25.
>
> - Fix do_pattern_match to use the top memory context to store compiled
> re cache. Before it was in per query memory context. This causes a
> trouble because do_pattern_match checks the cache existence using
> a static variable.
>
> - Refactor search_str_set, which is a workhorse of pattern matching,
> into multiple functions to understand the logic easier.

CFBot complains a compiler error in the v26 patch.
Attached is v27 patch to fix this. Also some typo in comment are fixed.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Attachment	Content-Type	Size
v27-0001-Row-pattern-recognition-patch-for-raw-parser.patch	text/x-patch	19.9 KB
v27-0002-Row-pattern-recognition-patch-parse-analysis.patch	text/x-patch	11.6 KB
v27-0003-Row-pattern-recognition-patch-rewriter.patch	text/x-patch	4.1 KB
v27-0004-Row-pattern-recognition-patch-planner.patch	text/x-patch	6.6 KB
v27-0005-Row-pattern-recognition-patch-executor.patch	text/x-patch	60.3 KB
v27-0006-Row-pattern-recognition-patch-docs.patch	text/x-patch	9.7 KB
v27-0007-Row-pattern-recognition-patch-tests.patch	text/x-patch	50.6 KB
v27-0008-Row-pattern-recognition-patch-typedefs.list.patch	text/x-patch	1.2 KB
v27-0009-Allow-to-print-raw-parse-tree.patch	text/x-patch	785 bytes

In response to

Re: Row pattern recognition at 2024-12-30 13:37:18 from Tatsuo Ishii

Responses

Re: Row pattern recognition at 2025-01-11 05:46:11 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	James Hunter	2024-12-31 00:34:51	Re: Add the ability to limit the amount of memory that can be allocated to backends.
Previous Message	David Rowley	2024-12-30 23:12:23	Re: Add the ability to limit the amount of memory that can be allocated to backends.