Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: vik(at)postgresfriends(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-06-28 00:58:19
Message-ID: 20230628.095819.1123345378945479712.t-ishii@sranhm.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Okay, I see the problem now, and why you need the rpr() function.
>
> You are doing this as something that happens over a window frame, but
> it is actually something that *reduces* the window frame. The pattern
> matching needs to be done when the frame is calculated and not when
> any particular function is applied over it.

Yes. (I think the standard calls the window frame as "full window
frame" in context of RPR to make a contrast with the subset of the
frame rows restricted by RPR. The paper I refered to as [2] claims
that the latter window frame is called "reduced window frame" in the
standard but I wasn't able to find the term in the standard.)

I wanted to demonstate that pattern matching logic is basically
correct in the PoC patch. Now what I need to do is, move the row
pattern matching logic to somewhere inside nodeWindowAgg so that
"restricted window frame" can be applied to all window functions and
window aggregates. Currently I am looking into update_frameheadpos()
and update_frametailpos() which calculate the frame head and tail
against current row. What do you think?

> This query (with all the defaults made explicit):
>
> SELECT s.company, s.tdate, s.price,
> FIRST_VALUE(s.tdate) OVER w,
> LAST_VALUE(s.tdate) OVER w,
> lowest OVER w
> FROM stock AS s
> WINDOW w AS (
> PARTITION BY s.company
> ORDER BY s.tdate
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> EXCLUDE NO OTHERS
> MEASURES
> LAST(DOWN) AS lowest
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL PATTERN (START DOWN+ UP+)
> DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
>
> Should produce this result:

[snip]

Thanks for the examples. I agree with the expected query results.

>>>> o SUBSET is not supported
>>>
>>> Is this because you haven't done it yet, or because you ran into
>>> problems trying to do it?
>> Because it seems SUBSET is not useful without MEASURES support. Thus
>> my plan is, firstly implement MEASURES, then SUBSET. What do you
>> think?
>
>
> SUBSET elements can be used in DEFINE clauses, but I do not think this
> is important compared to other features.

Ok.

>>> I have not looked at the patch yet, but is the reason for doing R020
>>> before R010 because you haven't done the MEASURES clause yet?
>> One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
>> harder for me because modifying main SELECT clause could be a hard
>> work. Another reason is, I had no idea how to implement PREV/NEXT in
>> other than in WINDOW clause. Other people might feel differently
>> though.
>
>
> I think we could do this with a single tuplesort if we use
> backtracking (which might be really slow for some patterns). I have
> not looked into it in any detail.
>
> We would need to be able to remove tuples from the end (even if only
> logically), and be able to update tuples inside the store. Both of
> those needs come from backtracking and possibly changing the
> classifier.
>
> Without backtracking, I don't see how we could do it without have a
> separate tuplestore for every current possible match.

Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?

> I looked at your v2 patches a little bit and the only comment that I
> currently have on the code is you spelled PERMUTE as
> PREMUTE. Everything else is hopefully explained above.

Thanks. Will fix.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-06-28 01:02:21 Re: check_strxfrm_bug()
Previous Message Kyotaro Horiguchi 2023-06-28 00:20:27 Re: Add TLI number to name of files generated by pg_waldump --save-fullpage