Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: jacob(dot)champion(at)enterprisedb(dot)com
Cc: vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org
Subject: Re: Row pattern recognition
Date: 2024-09-30 00:07:51
Message-ID: 20240930.090751.1732676018799092095.ishii@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> While playing with the feature, I've been trying to identify runs of
>> matched rows by eye. But it's pretty difficult -- the best I can do is
>> manually count rows using a `COUNT(*) OVER ...`. So I'd like to
>> suggest that MEASURES be part of the eventual v1 feature, if there's
>> no other way to determine whether a row was skipped by a previous
>> match. (That was less obvious to me before the fix in v17.)
>
> I think implementing MEASURES is challenging. Especially we need to
> find how our parser accepts "colname OVER
> window_definition". Currently PostgreSQL's parser only accepts "func()
> OVER window_definition" Even it is technically possible, I think the
> v1 patch size will become much larger than now due to this.
>
> How about inventing new window function that returns row state instead?
>
> - match found (yes/no)
> - skipped due to AFTER MATCH SKIP PAST LAST ROW (no match)

Please disregard my proposal. Even if we make such a function, it
would always return NULL for unmatched rows or skipped rows, and I
think the function does not solve your problem.

However, I wonder if supporting MEASURES solves the problem either
because any columns defined by MEASURES will return NULL except the
first row in a reduced frame. Can you please show an example how to
identify runs of matched rows using MEASURES?

> For the rest of the mail I need more time to understand. I will reply
> back after studying it. For now, I just want to thank you for the
> valuable information!
>
>> --
>>
>> I've been working on an implementation [1] of SQL/RPR's "parenthesized
>> language" and preferment order. (These are defined in SQL/Foundation
>> 2023, section 9.41.) The tool gives you a way to figure out, for a
>> given pattern, what matches are supposed to be attempted and in what
>> order:
>>
>> $ ./src/test/modules/rpr/rpr_prefer "a b? a"
>> ( ( a ( b ) ) a )
>> ( ( a ( ) ) a )
>>
>> Many simple patterns result in an infinite set of possible matches. So
>> if you use an unbounded quantifiers, you have to also use --max-rows
>> to limit the size of the hypothetical window frame:
>>
>> $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
>> ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
>> ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
>> ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
>> ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
>> ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
>> ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
>> ( ( ^ ( ) ) $ )

I wonder how Oracle solves the problem (an infinite set of possible
matches) without using "--max-rows" or something like that because in
my understanding Oracle supports the regular expressions and PERMUTE.

>> I've found this useful to check my personal understanding of the spec
>> and the match behavior, but it could also potentially be used to
>> generate test cases, or to help users debug their own patterns. For
>> example, a pattern that has a bunch of duplicate sequences in its PL
>> is probably not very well optimized:
>>
>> $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
>> ( ( a a a ) ( a ) )
>> ( ( a a ) ( a a ) )
>> ( ( a a ) ( a ) )
>> ( ( a ) ( a a a ) )
>> ( ( a ) ( a a ) )
>> ( ( a ) ( a ) )
>>
>> And patterns with catastrophic backtracking behavior tend to show a
>> "sawtooth" pattern in the output, with a huge number of potential
>> matches being generated relative to the number of rows in the frame.
>>
>> My implementation is really messy -- it leaks memory like a sieve, and
>> I cannibalized the parser from ECPG, which just ended up as an
>> exercise in teaching myself flex/bison. But if there's interest in
>> having this kind of tool in the tree, I can work on making it
>> reviewable. Either way, I should be able to use it to double-check
>> more complicated test cases.

I definitely am interested in the tool!

>> A while back [2], you were wondering whether our Bison implementation
>> would be able to parse the PATTERN grammar directly. I think this tool
>> proves that the answer is "yes", but PERMUTE in particular causes a
>> shift/reduce conflict. To fix it, I applied the same precedence
>> workaround that we use for CUBE and ROLLUP.

That's a good news!

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Krennwallner 2024-09-30 00:45:50 pg_upgrade check for invalid databases
Previous Message Thomas Munro 2024-09-29 22:28:04 Re: msys inet_pton strangeness