Re: BUG #18708: regex problem: (?:[^\d\D]){0} asserts with "lp->nouts == 0 && rp->nins == 0"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: dhyan(at)nataraj(dot)su, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18708: regex problem: (?:[^\d\D]){0} asserts with "lp->nouts == 0 && rp->nins == 0"
Date: 2024-11-14 23:36:14
Message-ID: 2242483.1731627374@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Aleksander Alekseev <aleksander(at)timescale(dot)com> writes:
>> If you run
>> SELECT '' ~ '(?:[^\d\D]){0}';
>> it will assert with "lp->nouts == 0 && rp->nins == 0"

> I wonder if the Assert is just wrong or if it's more complicated than that.

Interesting example. I think the bug's actually in my commit
08c0d6ad6 (Invent "rainbow" arcs), although it is unreachable
before 2a0af7fe4. What's happening is:

* "\d\D" can match any character whatsoever. optimizebracket()
notices this and replaces a sheaf of NFA arcs with a single
RAINBOW arc, which is the same representation as for ".".

* Then we apply the complementation process in cbracket(),
and that calls colorcomplement() which does this:

/* A RAINBOW arc matches all colors, making the complement empty */
if (findarc(of, PLAIN, RAINBOW) != NULL)
return;

We thus end up with no arcs from the bracket expression's start
state to its end state. This is not wrong in itself: "you can't
get there from here" is a perfectly reasonable representation
of [^\d\D], which cannot match any character.

* However, the (?: ... ){0} superstructure around that eventually
leads us to

/* annoying special case: {0} or {0,0} cancels everything */
if (m == 0 && n == 0)
{
...
/* Otherwise, we can clean up any subre infrastructure we made */
if (atom != NULL)
freesubre(v, atom);
delsub(v->nfa, lp, rp);
}

which is trying to throw away the detritus from inside the quantified
subexpression. But delsub fails to remove it all, because there's no
path between the two specified states, and its assert notices that.

This is, I believe, not harmful in itself: the leftover unreachable
states/arcs will be cleaned up later, and we end with a valid NFA.

So, as you noted, removing that assert would suppress all visible
symptoms of the problem. But I really don't want to do that; this
code is complicated and we need all the help we can get to find
bugs in it.

So I think the right thing to do is to fix colorcomplement() so
that it still produces some arc between its "from" and "to"
states, keeping the graph connected until we finish parsing the
regex. It has to be a dummy arc that can't match anything, of
course. We can throw away such an arc once we get to regex
optimization, because we don't need delsub() to work anymore.

In the attached draft patch I represented the dummy arc as a PLAIN
arc with a new fake color BLACK. Another way to do it could be to
introduce a new arc "type" value, but that seemed like it would
involve touching more code.

This is admittedly a lot more code than removing one assert,
but I think we'll regret it if we allow delsub failures to go
undetected.

regards, tom lane

Attachment Content-Type Size
fix-bug-18708.patch text/x-diff 7.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-11-15 02:14:27 Re: BUG #18708: regex problem: (?:[^\d\D]){0} asserts with "lp->nouts == 0 && rp->nins == 0"
Previous Message Aleksander Alekseev 2024-11-14 18:09:03 Re: Libpq library error when doing physical Replication