From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Regex performance regression induced by match-all code |
Date: | 2021-05-02 05:46:39 |
Message-ID: | ab0aa846-af42-4fa5-9e05-aba7f11a0b56@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 1, 2021, at 21:46, Tom Lane wrote:
> I found a nasty performance problem in commit 824bf7190: given the
> right sort of regex, checkmatchall() takes an unreasonable amount
> of time.
Nice catch.
> fix-exponential-cost-of-checkmatchall-1.patch
I've successfully tested the patch on the regex corpus:
SELECT
is_match <> (subject ~ pattern),
captured IS DISTINCT FROM regexp_match(subject, pattern, flags),
COUNT(*)
FROM performance_test
GROUP BY 1,2
ORDER BY 1,2;
?column? | ?column? | count
----------+----------+---------
f | f | 3253889
(1 row)
HEAD (651d005e76bc0b9542615f609b4d0d946035dc58)
Time: 94096.020 ms (01:34.096)
Time: 93102.287 ms (01:33.102)
Time: 93333.746 ms (01:33.334)
fix-exponential-cost-of-checkmatchall-1.patch
Time: 95247.529 ms (01:35.248)
Time: 92617.502 ms (01:32.618)
Time: 93259.700 ms (01:33.260)
I've also tested the problematic type of regexes:
HEAD (651d005e76bc0b9542615f609b4d0d946035dc58)
SELECT regexp_matches('', '(.|){20}','');
Time: 61.613 ms
SELECT regexp_matches('', '(.|){25}','');
Time: 1928.674 ms (00:01.929)
SELECT regexp_matches('', '(.|){27}','');
Time: 7789.601 ms (00:07.790)
fix-exponential-cost-of-checkmatchall-1.patch
SELECT regexp_matches('', '(.|){20}','');
Time: 0.965 ms
SELECT regexp_matches('', '(.|){25}','');
Time: 0.586 ms
SELECT regexp_matches('', '(.|){27}','');
Time: 0.788 ms
Nice improvement, thanks.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2021-05-02 10:30:59 | Re: Enhanced error message to include hint messages for redundant options error |
Previous Message | Julien Rouhaud | 2021-05-02 04:27:37 | Re: Some oversights in query_id calculation |