From: | Jacob Champion <jchampion(at)timescale(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org, vik(at)postgresfriends(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2023-07-20 23:36:37 |
Message-ID: | 022a392f-2a9e-3df1-a509-a5bd51160fa5@timescale.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Ishii-san,
On 7/19/23 22:15, Tatsuo Ishii wrote:
> Currently my patch has a limitation for the sake of simple
> implementation: a pattern like "A+" is parsed and analyzed in the raw
> parser. This makes subsequent process much easier because the pattern
> element variable (in this case "A") and the quantifier (in this case
> "+") is already identified by the raw parser. However there are much
> more cases are allowed in the standard as you already pointed out. For
> those cases probably we should give up to parse PATTERN items in the
> raw parser, and instead the raw parser just accepts the elements as
> Sconst?
Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free... Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?
> Any comments, especially on the PREV/NEXT implementation part is
> welcome. Currently the DEFINE expression like "price > PREV(price)" is
> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
> hack and should be gotten ride of before v1 is committed. Better idea?
I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, like
DEFINE B AS B.price > PREV(FIRST(A.price), 3)
so there won't be just one pair of "previous and next" tuples. Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.
--
Taking a closer look at the regex engine:
It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern like
PATTERN ( a+ )
DEFINE a AS TRUE
which doesn't seem to match anything in my testing.
There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The pattern
PATTERN ( a+ b+ )
DEFINE a AS col = 2,
b AS col = 2
doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.
(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)
I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?
Thanks,
--Jacob
Attachment | Content-Type | Size |
---|---|---|
tests.diff.txt | text/plain | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Vik Fearing | 2023-07-21 00:07:44 | Re: Row pattern recognition |
Previous Message | John Morris | 2023-07-20 23:13:29 | Re: Atomic ops for unlogged LSN |