From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Some regular-expression performance hacking |
Date: | 2021-03-06 18:09:25 |
Message-ID: | 20210306180925.GA2345664@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 13, 2021 at 06:19:34PM +0100, Joel Jacobson wrote:
> To test the correctness of the patches,
> I thought it would be nice with some real-life regexes,
> and just as important, some real-life text strings,
> to which the real-life regexes are applied to.
>
> I therefore patched Chromium's v8 regexes engine,
> to log the actual regexes that get compiled when
> visiting websites, and also the text strings that
> are the regexes are applied to during run-time
> when the regexes are executed.
>
> I logged the regex and text strings as base64 encoded
> strings to STDOUT, to make it easy to grep out the data,
> so it could be imported into PostgreSQL for analytics.
>
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.
It's great to see this kind of testing. Thanks for doing it.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-03-06 18:19:44 | Re: [PATCH] pgbench: Bug fix for the -d option |
Previous Message | Tom Lane | 2021-03-06 18:09:10 | Re: Feedback on table expansion hook (including patch) |