| From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
|---|---|
| To: | Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu> |
| Subject: | Re: speed up verifying UTF-8 |
| Date: | 2021-08-26 15:35:54 |
| Message-ID: | CAFBsxsEdUk96E1QLK1AEd8LudSd6Wo8k+w6_+KYYMgwJKAVy0g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I wrote:
> Naively, the shift-based DFA requires 64-bit integers to encode the
transitions, but I recently came across an idea from Dougall Johnson of
using the Z3 SMT solver to pack the transitions into 32-bit integers [1].
That halves the size of the transition table for free. I adapted that
effort to the existing conventions in v22 and arrived at the attached
python script.
> [...]
> I'll include something like the attached text file diff in the next
patch. Some comments are now outdated, but this is good enough for
demonstration.
Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments.
--
John Naylor
EDB: http://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v23-0001-Add-fast-paths-for-validating-UTF-8-text.patch | application/x-patch | 63.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Stephen Frost | 2021-08-26 15:36:51 | Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce) |
| Previous Message | Robert Haas | 2021-08-26 15:35:01 | Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce) |