perl: unsafe empty pattern behavior

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: perl: unsafe empty pattern behavior
Date: 2024-03-12 17:22:04
Message-ID: 4a1db3c8ea39156dbe4a4f9e166ca9453e05daaa.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Moved from discussion on -committers:

https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.camel@j-davis.com

Summary:

Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.

Long version:

Some may know this already, but we just discovered the dangers of using
empty patterns in perl:

"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."

https://perldoc.perl.org/perlop#The-empty-pattern-//

In other words, if you have code like:

if ('xyz' =~ //)
{
print "'xyz' matches //\n";
}

The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:

if ('abc' =~ /abc/)
{
print "'abc' matches /abc/\n";
}

if ('xyz' =~ //)
{
print "'xyz' matches //\n";
}

The first match will succeed, but the second match will fail, because
// is treated like /abc/.

On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.

If you use qr// instead:

https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

like:

if ('abc' =~ qr/abc/)
{
print "'abc' matches /abc/\n";
}

if ('xyz' =~ qr//)
{
print "'xyz' matches //\n";
}

Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.

Regards,
Jeff Davis

Attachment Content-Type Size
v1-0001-perl-avoid-empty-regex-patterns.patch text/x-patch 4.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-03-12 17:26:09 Re: UUID v7
Previous Message Bharath Rupireddy 2024-03-12 17:21:49 Re: Introduce XID age and inactive timeout based replication slot invalidation