pg_regprefix() doesn't handle lookahead constraints correctly

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-bugs(at)postgreSQL(dot)org
Subject: pg_regprefix() doesn't handle lookahead constraints correctly
Date: 2015-10-19 16:44:55
Message-ID: 1218.1445273095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Whilst poking around the topic of lookbehind constraints, I realized
that pg_regprefix(), which is code we added to extract any fixed prefix
that a regex might have, is already broken for lookahead constraints.
Observe:

regression=# create table pp (f1 text unique);
CREATE TABLE
regression=# explain select * from pp where f1 ~ '^abcde(f|(?=\d))';
QUERY PLAN
------------------------------------------------------------------------
Bitmap Heap Scan on pp (cost=4.22..14.37 rows=7 width=32)
Filter: (f1 ~ '^abcde(f|(?=\d))'::text)
-> Bitmap Index Scan on pp_f1_key (cost=0.00..4.22 rows=7 width=0)
Index Cond: ((f1 >= 'abcdef'::text) AND (f1 < 'abcdeg'::text))
(4 rows)

It thinks that "f" is part of the fixed prefix of the regex, whereas
in point of fact the regex will match strings that have a digit there.

The reason is that pg_regprefix thinks it can ignore LACON arcs
altogether, whereas really what it had better do is treat them as
a reason to give up searching.

In principle we could traverse across the LACON arc (which consumes no
input) and look to see if the following state has a single outarc color.
But that would greatly complicate the code and I doubt it corresponds to
any case that's of real-world interest. That is, it'd allow us to
recognize that, say, '^abcde(?=\w)f' has a fixed prefix of 'abcdef' not
just 'abcde', but who writes regexes like that? Most likely, a lookahead
constraint is at the end of the regex anyway. (Similarly, it won't be
very interesting to optimize for lookbehind constraints here, because
nobody would write such a thing in a fixed-prefix regex.)

This is such a corner case that it's not too surprising we've had no
field reports about it. But still, a bug is a bug.

regards, tom lane

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2015-10-19 16:54:31 Re: BUG #13687: bug in row_to_json function with '-infinity' timestamp
Previous Message Petr Jelinek 2015-10-19 15:54:47 Re: BUG #13687: bug in row_to_json function with '-infinity' timestamp