From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: speed up text_position() for utf-8 |
Date: | 2021-12-17 21:01:37 |
Message-ID: | CAFBsxsFUoxgYQ22xjFTVtq7UAoZbHFTigEeQn3bV=2PCyqgSpw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Attached is a short patch series to develop some ideas of inlining
pg_utf_mblen().
0001 puts the main implementation of pg_utf_mblen() into an inline
function and uses this in pg_mblen(). This is somewhat faster in the
strpos tests, so that gives some measure of the speedup expected for
other callers. Text search seems to call this a lot, so this might
have noticeable benefit.
0002 refactors text_position_get_match_pos() to use
pg_mbstrlen_with_len(). This itself is significantly faster when
combined with 0001, likely because the latter can inline the call to
pg_mblen(). The intention is to speed up more than just text_position.
0003 explicitly specializes for the inline version of pg_utf_mblen()
into pg_mbstrlen_with_len(), but turns out to be almost as slow as
master for ascii. It doesn't help if I undo the previous change in
pg_mblen(), and I haven't investigated why yet.
0002 looks good now, but the experience with 0003 makes me hesitant to
propose this seriously until I can figure out what's going on there.
The test is as earlier, a worst-case substring search, times in milliseconds.
patch | no match | ascii | multibyte
--------+----------+-------+-----------
PG11 | 1220 | 1220 | 1150
master | 385 | 2420 | 1980
0001 | 390 | 2180 | 1670
0002 | 389 | 1330 | 1100
0003 | 391 | 2100 | 1360
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v2-0002-Refactor-text_position_get_match_pos-to-use-pg_mb.patch | application/octet-stream | 2.1 KB |
v2-0003-Specialize-pg_mbstrlen_with_len-for-UTF-8.patch | application/octet-stream | 897 bytes |
v2-0001-Move-the-implementation-of-pg_utf_mblen-to-an-inl.patch | application/octet-stream | 4.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-12-17 21:07:18 | Re: Column Filtering in Logical Replication |
Previous Message | Daniel Gustafsson | 2021-12-17 20:42:19 | Re: Adding CI to our tree |