From: | John Naylor <jcnaylor(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Speeding up text_position_next with multibyte encodings |
Date: | 2018-12-26 21:45:08 |
Message-ID: | CAJVSVGURXQCk=8tyPJ4JomFRQFOham7e=D8e2twS0PDCoAPpSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/22/18, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 14/12/2018 20:20, John Naylor wrote:
> I'm afraid that script doesn't work as a performance test. The
> position() function is immutable, so the result gets cached in the plan
> cache. All you're measuring is the speed to get the constant from the
> plan cache :-(.
That makes perfect sense now. I should have been more skeptical about
the small and medium sizes having similar times. :/
> I rewrote the same tests with a little C helper function, attached, to
> fix that, and to eliminate any PL/pgSQL overhead.
Thanks for that, I'll probably have occasion to do something like this
for other tests.
> You chose interesting characters for the UTF-8 test. The haystack is a
> repeating pattern of byte sequence EC 99 84, and the needle is a
> repeating pattern of EC 84 B1. In the 'long' test, the lengths in the
> skip table are '2', '1' and '250'. But the search bounces between the
> '2' and '1' cases, and never hits the byte that would allow it to skip
> 250 bytes. Interesting case, I had not realized that that can happen.
Me neither, that was unintentional.
> But I don't think we need to put much weight on that, you could come up
> with other scenarios where the current code has skip table collisions, too.
Okay.
> So overall, I think it's still a worthwhile tradeoff, given that that is
> a worst case scenario. If you choose less pathological UTF-8 codepoints,
> or there is no match or the match is closer to the beginning of the
> string, the patch wins.
On 12/23/18, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> So, what is the expected speedup in a "good/average" case? Do we have
> some reasonable real-world workload mixing these cases that could be
> used as a realistic benchmark?
I'll investigate some "better" cases.
-John Naylor
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2018-12-26 21:48:23 | Re: random() (was Re: New GUC to sample log queries) |
Previous Message | Robert Haas | 2018-12-26 21:37:40 | Re: Shared Memory: How to use SYSV rather than MMAP ? |