From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2024-12-19 06:19:50 |
Message-ID: | 20241219.151950.488757175470671324.ishii@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I have looked into the performance of current RPR implementation,
especially when the number of rows in a reduced frame is large (like
over 10k). Below is a simple benchmark against pgbench database. The
SQL will produce a reduced frame having 10k rows.
EXPLAIN (ANALYZE)
SELECT aid, bid, count(*) OVER w
FROM pgbench_accounts WHERE aid <= 10000
WINDOW w AS (
PARTITION BY bid
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+)
DEFINE
START AS TRUE,
UP AS aid > PREV(aid)
);
This took 722 ms on my laptop. It's not very quick. Moreover, if I
expand the reduced frame size to 100k (aid <= 100000), OOM killer
triggered. I looked into the code and found that do_pattern_match in
nodeWindowAgg.c is one of the major problems. It calls regexp_instr to
know whether the regular expression derived from a PATTERN clause
(e.g. "ab+c+") matches an encoded row pattern variable string
(e.g. "abbcc"). The latter string could be quite long: the length
could be as same as the number of rows in the reduced frame. Thus, The
length could become 100k if the frame size is 100k. Unfortunately
regexp_instr needs to allocate and convert the input string to wchar
(it's 4-byte long for each character), which uses 4x space bigger than
the original input string. In RPR case the input string is always
ASCII and does not need to be converted to wchar. So I decided to
switch to the standard regular expression engine coming with OS. With
this change, I got 2x speed up in the 10k case.
v23 patch: 722.618 ms (average of 3 runs)
new patch: 322.5913 ms (average of 3 runs)
Also I tried the 100k rows reduced frame case. It was slow (took 26
seconds) but it completed without OOM killer. Attached is the
patch. The change was in 0005 only. Other patches were not changed
from v23.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachment | Content-Type | Size |
---|---|---|
v24-0001-Row-pattern-recognition-patch-for-raw-parser.patch | text/x-patch | 19.9 KB |
v24-0002-Row-pattern-recognition-patch-parse-analysis.patch | text/x-patch | 11.6 KB |
v24-0003-Row-pattern-recognition-patch-rewriter.patch | text/x-patch | 4.1 KB |
v24-0004-Row-pattern-recognition-patch-planner.patch | text/x-patch | 6.7 KB |
v24-0005-Row-pattern-recognition-patch-executor.patch | text/x-patch | 54.4 KB |
v24-0006-Row-pattern-recognition-patch-docs.patch | text/x-patch | 9.7 KB |
v24-0007-Row-pattern-recognition-patch-tests.patch | text/x-patch | 50.6 KB |
v24-0008-Allow-to-print-raw-parse-tree.patch | text/x-patch | 785 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Shlok Kyal | 2024-12-19 06:27:12 | Re: Logical replication timeout |
Previous Message | Bertrand Drouvot | 2024-12-19 06:12:04 | Re: per backend I/O statistics |