From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2024-12-30 23:57:07 |
Message-ID: | 20241231.085707.2149067544950521144.ishii@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I have added further optimization to the v25 patch.
>
> While generating possible input strings that may satisfy the pattern
> string, it is possible to omit to run regexp in some cases. Since
> regexp matching is heavy operation, especially if it is applied to
> long string, it is beneficial for RPR to reduce the number of regexp
> runs.
>
> If the tail pattern variable has '+' quantifier and previously the
> input string was confirmed to be matched the pattern string, and the
> same character as the tail pattern string is added, we don't need run
> regexp match again because the new input string surely matches the
> pattern string. Suppose a pattern string is "ab+" and the current
> input string is "ab" (this satisfies "ab+"). If the new input string
> is "abb", then "abb" surely matches "ab+" too and we don't need to run
> regexp again.
>
> In v26 patch, by using the technique above I get performance
> improvement.
>
>>> EXPLAIN (ANALYZE)
>>> SELECT aid, bid, count(*) OVER w
>>> FROM pgbench_accounts WHERE aid <= 10000
>>> WINDOW w AS (
>>> PARTITION BY bid
>>> ORDER BY aid
>>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>> AFTER MATCH SKIP PAST LAST ROW
>>> INITIAL
>>> PATTERN (START UP+)
>>> DEFINE
>>> START AS TRUE,
>>> UP AS aid > PREV(aid)
>>> );
>
> This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
> it takes 41.84 ms, which is over 7 times improvement. Also I run the
> SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
> 1195.603 ms, which is over 21 times improvement.
>
> I think a pattern string ended up with '+' is one of common use cases
> of RPR, and I believe the improvement is useful for many RPR
> applications.
>
> I also add following changes to v25.
>
> - Fix do_pattern_match to use the top memory context to store compiled
> re cache. Before it was in per query memory context. This causes a
> trouble because do_pattern_match checks the cache existence using
> a static variable.
>
> - Refactor search_str_set, which is a workhorse of pattern matching,
> into multiple functions to understand the logic easier.
CFBot complains a compiler error in the v26 patch.
Attached is v27 patch to fix this. Also some typo in comment are fixed.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachment | Content-Type | Size |
---|---|---|
v27-0001-Row-pattern-recognition-patch-for-raw-parser.patch | text/x-patch | 19.9 KB |
v27-0002-Row-pattern-recognition-patch-parse-analysis.patch | text/x-patch | 11.6 KB |
v27-0003-Row-pattern-recognition-patch-rewriter.patch | text/x-patch | 4.1 KB |
v27-0004-Row-pattern-recognition-patch-planner.patch | text/x-patch | 6.6 KB |
v27-0005-Row-pattern-recognition-patch-executor.patch | text/x-patch | 60.3 KB |
v27-0006-Row-pattern-recognition-patch-docs.patch | text/x-patch | 9.7 KB |
v27-0007-Row-pattern-recognition-patch-tests.patch | text/x-patch | 50.6 KB |
v27-0008-Row-pattern-recognition-patch-typedefs.list.patch | text/x-patch | 1.2 KB |
v27-0009-Allow-to-print-raw-parse-tree.patch | text/x-patch | 785 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | James Hunter | 2024-12-31 00:34:51 | Re: Add the ability to limit the amount of memory that can be allocated to backends. |
Previous Message | David Rowley | 2024-12-30 23:12:23 | Re: Add the ability to limit the amount of memory that can be allocated to backends. |