From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | perl: unsafe empty pattern behavior |
Date: | 2024-03-12 17:22:04 |
Message-ID: | 4a1db3c8ea39156dbe4a4f9e166ca9453e05daaa.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Moved from discussion on -committers:
https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.camel@j-davis.com
Summary:
Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.
Long version:
Some may know this already, but we just discovered the dangers of using
empty patterns in perl:
"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."
https://perldoc.perl.org/perlop#The-empty-pattern-//
In other words, if you have code like:
if ('xyz' =~ //)
{
print "'xyz' matches //\n";
}
The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:
if ('abc' =~ /abc/)
{
print "'abc' matches /abc/\n";
}
if ('xyz' =~ //)
{
print "'xyz' matches //\n";
}
The first match will succeed, but the second match will fail, because
// is treated like /abc/.
On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.
If you use qr// instead:
https://perldoc.perl.org/perlop#qr/STRING/msixpodualn
like:
if ('abc' =~ qr/abc/)
{
print "'abc' matches /abc/\n";
}
if ('xyz' =~ qr//)
{
print "'xyz' matches //\n";
}
Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
v1-0001-perl-avoid-empty-regex-patterns.patch | text/x-patch | 4.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2024-03-12 17:26:09 | Re: UUID v7 |
Previous Message | Bharath Rupireddy | 2024-03-12 17:21:49 | Re: Introduce XID age and inactive timeout based replication slot invalidation |