From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: Another regexp performance improvement: skip useless paren-captures |
Date: | 2021-08-10 01:11:14 |
Message-ID: | 3730031.1628557874@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
>> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There is a potentially interesting definitional question:
>> what exactly ought this regexp do?
>> ((.)){0}\2
>> Because the capturing paren sets are zero-quantified, they will
>> never be matched to any characters, so the backref can never
>> have any defined referent.
> Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we should be clear what that is:
> if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
> {
> print "captured 1 $1\n" if defined $1;
> print "captured 2 $2\n" if defined $2;
> print "captured 3 $3\n" if defined $3;
> print "captured 4 $4\n" if defined $4;
> print "match = $match\n" if defined $match;
> }
Hm. I'm not sure that this example proves anything about Perl's handling
of the situation, since you didn't use a backref. I tried both
if ('foo' =~ m/((.)){0}\1/)
if ('foo' =~ m/((.)){0}\2/)
and while neither throws an error, they don't succeed either.
So AFAICS Perl is acting in the way I'm attributing to POSIX.
But maybe we should actually read POSIX ...
>> ... I guess Spencer did think about this to some extent -- he
>> just forgot about the possibility of nested parens.
> Ugg. That means our code throws an error where perl does not, pretty
> well negating my point above. If we're already throwing an error for
> this type of thing, I agree we should be consistent about it. My
> personal preference would have been to do the same thing as perl, but it
> seems that ship has already sailed.
Removing an error case is usually an easier sell than adding one.
However, the fact that the simplest case (viz, '(.){0}\1') has always
thrown an error and nobody has complained in twenty-ish years suggests
that nobody much cares.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-08-10 01:17:40 | Re: Another regexp performance improvement: skip useless paren-captures |
Previous Message | Masahiko Sawada | 2021-08-10 01:01:02 | Re: Small documentation improvement for ALTER SUBSCRIPTION |