From: | Jacob Champion <jchampion(at)timescale(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | er(at)xs4all(dot)nl, vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Row pattern recognition |
Date: | 2023-09-11 22:13:43 |
Message-ID: | CAAWbhmjq3NY1+Am-QHJ4AFh7mi=2eiiGqj518f3-j-C3EfffPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:
> Then we will get for str_set:
> r0: B
> r1: AB
>
> Because r0 only has classifier B, r1 can have A and B. Problem is,
> r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
> r2 = AB. I guess this is the issue you pointed out.
Right.
> Yeah, probably we have delay evaluation of such pattern variables like
> A, then reevaluate A after the first scan.
>
> What about leaving this (reevaluation) for now? Because:
>
> 1) we don't have CLASSIFIER
> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>
> so I think we don't need to worry about this for now.
Sure. I'm all for deferring features to make it easier to iterate; I
just want to make sure the architecture doesn't hit a dead end. Or at
least, not without being aware of it.
Also: is CLASSIFIER the only way to run into this issue?
> What if we don't follow the standard, instead we follow POSIX EREs? I
> think this is better for users unless RPR's REs has significant merit
> for users.
Piggybacking off of what Vik wrote upthread, I think we would not be
doing ourselves any favors by introducing a non-compliant
implementation that performs worse than a traditional NFA. Those would
be some awful bug reports.
> > - I think we have to implement a parallel parser regardless (RPR PATTERN
> > syntax looks incompatible with POSIX)
>
> I am not sure if we need to worry about this because of the reason I
> mentioned above.
Even if we adopted POSIX NFA semantics, we'd still have to implement
our own parser for the PATTERN part of the query. I don't think
there's a good way for us to reuse the parser in src/backend/regex.
> > Does that seem like a workable approach? (Worst-case, my code is just
> > horrible, and we throw it in the trash.)
>
> Yes, it seems workable. I think for the first cut of RPR needs at
> least the +quantifier with reasonable performance. The current naive
> implementation seems to have issue because of exhaustive search.
+1
Thanks!
--Jacob
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-09-11 23:56:14 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Previous Message | Thomas Munro | 2023-09-11 21:04:59 | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |