From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | jchampion(at)timescale(dot)com |
Cc: | er(at)xs4all(dot)nl, vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2023-09-12 06:18:43 |
Message-ID: | 20230912.151843.1810014897541305662.t-ishii@sranhm.sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> What about leaving this (reevaluation) for now? Because:
>>
>> 1) we don't have CLASSIFIER
>> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>>
>> so I think we don't need to worry about this for now.
>
> Sure. I'm all for deferring features to make it easier to iterate; I
> just want to make sure the architecture doesn't hit a dead end. Or at
> least, not without being aware of it.
Ok, let's defer this issue. Currently the patch already exceeds 3k
lines. I am afraid too big patch cannot be reviewed by anyone, which
means it will never be committed.
> Also: is CLASSIFIER the only way to run into this issue?
Good question. I would like to know.
>> What if we don't follow the standard, instead we follow POSIX EREs? I
>> think this is better for users unless RPR's REs has significant merit
>> for users.
>
> Piggybacking off of what Vik wrote upthread, I think we would not be
> doing ourselves any favors by introducing a non-compliant
> implementation that performs worse than a traditional NFA. Those would
> be some awful bug reports.
What I am not sure about is, you and Vik mentioned that the
traditional NFA is superior that POSIX NFA in terms of performance.
But how "lexicographic ordering" is related to performance?
>> I am not sure if we need to worry about this because of the reason I
>> mentioned above.
>
> Even if we adopted POSIX NFA semantics, we'd still have to implement
> our own parser for the PATTERN part of the query. I don't think
> there's a good way for us to reuse the parser in src/backend/regex.
Ok.
>> > Does that seem like a workable approach? (Worst-case, my code is just
>> > horrible, and we throw it in the trash.)
>>
>> Yes, it seems workable. I think for the first cut of RPR needs at
>> least the +quantifier with reasonable performance. The current naive
>> implementation seems to have issue because of exhaustive search.
>
> +1
BTW, attched is the v6 patch. The differences from v5 include:
- Now aggregates can be used with RPR. Below is an example from the
regression test cases, which is added by v6 patch.
- Fix assersion error pointed out by Erik.
SELECT company, tdate, price,
first_value(price) OVER w,
last_value(price) OVER w,
max(price) OVER w,
min(price) OVER w,
sum(price) OVER w,
avg(price) OVER w,
count(price) OVER w
FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+ DOWN+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
company | tdate | price | first_value | last_value | max | min | sum | avg | count
----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4
company1 | 07-02-2023 | 200 | | | | | | |
company1 | 07-03-2023 | 150 | | | | | | |
company1 | 07-04-2023 | 140 | | | | | | |
company1 | 07-05-2023 | 150 | | | | | | |
company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4
company1 | 07-07-2023 | 110 | | | | | | |
company1 | 07-08-2023 | 130 | | | | | | |
company1 | 07-09-2023 | 120 | | | | | | |
company1 | 07-10-2023 | 130 | | | | | | |
company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4
company2 | 07-02-2023 | 2000 | | | | | | |
company2 | 07-03-2023 | 1500 | | | | | | |
company2 | 07-04-2023 | 1400 | | | | | | |
company2 | 07-05-2023 | 1500 | | | | | | |
company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4
company2 | 07-07-2023 | 1100 | | | | | | |
company2 | 07-08-2023 | 1300 | | | | | | |
company2 | 07-09-2023 | 1200 | | | | | | |
company2 | 07-10-2023 | 1300 | | | | | | |
(20 rows)
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Row-pattern-recognition-patch-for-raw-parser.patch | text/x-patch | 20.7 KB |
v6-0002-Row-pattern-recognition-patch-parse-analysis.patch | text/x-patch | 11.6 KB |
v6-0003-Row-pattern-recognition-patch-planner.patch | text/x-patch | 4.8 KB |
v6-0004-Row-pattern-recognition-patch-executor.patch | text/x-patch | 34.9 KB |
v6-0005-Row-pattern-recognition-patch-docs.patch | text/x-patch | 8.2 KB |
v6-0006-Row-pattern-recognition-patch-tests.patch | text/x-patch | 35.3 KB |
v6-0007-Allow-to-print-raw-parse-tree.patch | text/x-patch | 749 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-09-12 06:27:35 | Re: Make --help output fit within 80 columns per line |
Previous Message | Michael Paquier | 2023-09-12 06:18:05 | Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run |