From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-07-30 01:12:33 |
Message-ID: | CAFBsxsHDXCROQe-UC1nZOdcdaCO90rihiYhBYrLHrf_sLKUY=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 26, 2021 at 8:56 AM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
>
> >
> > Does that (and "len >= 32" condition) mean the patch does not improve
validation of the shorter strings (the ones less than 32 bytes)?
>
> Right. Also, the 32 byte threshold was just a temporary need for testing
32-byte stride -- testing different thresholds wouldn't hurt. I'm not
terribly concerned about short strings, though, as long as we don't
regress.
I put together the attached quick test to try to rationalize the fast-path
threshold. (In case it isn't obvious, it must be at least 16 on all builds,
since wchar.c doesn't know which implementation it's calling, and SSE
register width sets the lower bound.) I changed the threshold first to 16,
and then 100000, which will force using the byte-at-a-time code.
If we have only 16 bytes in the input, it still seems to be faster to use
SSE, even though it's called through a function pointer on x86. I didn't
test the DFA path, but I don't think the conclusion would be different.
I'll include the 16 threshold next time I need to update the patch.
Macbook x86, clang 12:
master + use 16:
asc16 | asc32 | asc64 | mb16 | mb32 | mb64
-------+-------+-------+------+------+------
270 | 279 | 282 | 291 | 296 | 304
force byte-at-a-time:
asc16 | asc32 | asc64 | mb16 | mb32 | mb64
-------+-------+-------+------+------+------
277 | 292 | 310 | 296 | 317 | 362
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
mbverifystr-threshold.sql | application/octet-stream | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2021-07-30 02:02:57 | Re: pg_upgrade does not upgrade pg_stat_statements properly |
Previous Message | Andres Freund | 2021-07-30 01:03:55 | Re: Autovacuum on partitioned table (autoanalyze) |