Re: Row pattern recognition

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org
Subject: Re: Row pattern recognition
Date: 2024-09-27 22:27:07
Message-ID: CAOYmi+ns3kHjC83ap_BCfJCL0wfO5BJ_sEByOEpgNOrsPhqQTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 18, 2024 at 10:00 PM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
> Attached are the v22 patches. Just rebased.

Thanks!

With some bigger partitions, I hit an `ERROR: wrong pos: 1024`. A
test that reproduces it is attached.

While playing with the feature, I've been trying to identify runs of
matched rows by eye. But it's pretty difficult -- the best I can do is
manually count rows using a `COUNT(*) OVER ...`. So I'd like to
suggest that MEASURES be part of the eventual v1 feature, if there's
no other way to determine whether a row was skipped by a previous
match. (That was less obvious to me before the fix in v17.)

--

I've been working on an implementation [1] of SQL/RPR's "parenthesized
language" and preferment order. (These are defined in SQL/Foundation
2023, section 9.41.) The tool gives you a way to figure out, for a
given pattern, what matches are supposed to be attempted and in what
order:

$ ./src/test/modules/rpr/rpr_prefer "a b? a"
( ( a ( b ) ) a )
( ( a ( ) ) a )

Many simple patterns result in an infinite set of possible matches. So
if you use an unbounded quantifiers, you have to also use --max-rows
to limit the size of the hypothetical window frame:

$ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
( ( ^ ( ) ) $ )

I've found this useful to check my personal understanding of the spec
and the match behavior, but it could also potentially be used to
generate test cases, or to help users debug their own patterns. For
example, a pattern that has a bunch of duplicate sequences in its PL
is probably not very well optimized:

$ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
( ( a a a ) ( a ) )
( ( a a ) ( a a ) )
( ( a a ) ( a ) )
( ( a ) ( a a a ) )
( ( a ) ( a a ) )
( ( a ) ( a ) )

And patterns with catastrophic backtracking behavior tend to show a
"sawtooth" pattern in the output, with a huge number of potential
matches being generated relative to the number of rows in the frame.

My implementation is really messy -- it leaks memory like a sieve, and
I cannibalized the parser from ECPG, which just ended up as an
exercise in teaching myself flex/bison. But if there's interest in
having this kind of tool in the tree, I can work on making it
reviewable. Either way, I should be able to use it to double-check
more complicated test cases.

A while back [2], you were wondering whether our Bison implementation
would be able to parse the PATTERN grammar directly. I think this tool
proves that the answer is "yes", but PERMUTE in particular causes a
shift/reduce conflict. To fix it, I applied the same precedence
workaround that we use for CUBE and ROLLUP.

Thanks again!
--Jacob

[1] https://github.com/jchampio/postgres/tree/dev/rpr
[2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp

Attachment Content-Type Size
rpr-large-partitions.diff.txt text/plain 2.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-09-27 23:33:13 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Nathan Bossart 2024-09-27 21:24:50 Re: MAINTAIN privilege -- what do we need to un-revert it?