From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | jchampion(at)timescale(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org, vik(at)postgresfriends(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2023-07-21 06:16:48 |
Message-ID: | 20230721.151648.412762379013769790.t-ishii@sranhm.sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
> Hi Ishii-san,
>
> On 7/19/23 22:15, Tatsuo Ishii wrote:
>> Currently my patch has a limitation for the sake of simple
>> implementation: a pattern like "A+" is parsed and analyzed in the raw
>> parser. This makes subsequent process much easier because the pattern
>> element variable (in this case "A") and the quantifier (in this case
>> "+") is already identified by the raw parser. However there are much
>> more cases are allowed in the standard as you already pointed out. For
>> those cases probably we should give up to parse PATTERN items in the
>> raw parser, and instead the raw parser just accepts the elements as
>> Sconst?
>
> Is there a concern that the PATTERN grammar can't be represented in
> Bison? I thought it was all context-free...
I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1). Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.
> Or is the concern that the
> parse tree of the pattern is hard to feed into a regex engine?
One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,
PATTERN UP+
Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.
>> Any comments, especially on the PREV/NEXT implementation part is
>> welcome. Currently the DEFINE expression like "price > PREV(price)" is
>> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
>> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
>> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
>> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
>> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
>> hack and should be gotten ride of before v1 is committed. Better idea?
>
> I'm not familiar enough with this code yet to offer very concrete
> suggestions, sorry... But at some point in the future, we need to be
> able to skip forward and backward from arbitrary points, like
>
> DEFINE B AS B.price > PREV(FIRST(A.price), 3)
>
> so there won't be just one pair of "previous and next" tuples.
Yes, I know.
> Maybe
> that can help clarify the design? It feels like it'll need to eventually
> be a "real" function that operates on the window state, even if it
> doesn't support all of the possible complexities in v1.
Unfortunately an window function can not call other window functions.
> Taking a closer look at the regex engine:
>
> It looks like the + qualifier has trouble when it meets the end of the
> frame. For instance, try removing the last row of the 'stock' table in
> rpr.sql; some of the final matches will disappear unexpectedly. Or try a
> pattern like
>
> PATTERN ( a+ )
> DEFINE a AS TRUE
>
> which doesn't seem to match anything in my testing.
>
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread. The pattern
>
> PATTERN ( a+ b+ )
> DEFINE a AS col = 2,
> b AS col = 2
>
> doesn't match a sequence of values (2 2 ...) with the current
> implementation, even with a dummy row at the end to avoid the
> end-of-frame bug.
>
> (I've attached two failing tests against v2, to hopefully better
> illustrate, along with what I _think_ should be the correct results.)
Thanks. I will look into this.
> I'm not quite understanding the match loop in evaluate_pattern(). It
> looks like we're building up a string to pass to the regex engine, but
> by the we call regexp_instr, don't we already know whether or not the
> pattern will match based on the expression evaluation we've done?
For "+" yes. But for more complex regular expression like '{n}', we
need to call our regexp engine to check if the pattern matches.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Gurjeet Singh | 2023-07-21 06:19:54 | Re: MERGE ... RETURNING |
Previous Message | Junwang Zhao | 2023-07-21 06:05:56 | table_open/table_close with different lock mode |