Re: optimize lookups in snapshot [sub]xip arrays

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, John Naylor <jcnaylor(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimize lookups in snapshot [sub]xip arrays
Date: 2022-08-05 04:02:15
Message-ID: CAFBsxsGmNefB-LAo8hNXLDAwWN0vHXc6H2A+NfFUQrj=rn50DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 5, 2022 at 5:15 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:
>
> On Thu, Aug 04, 2022 at 02:58:14PM +0700, John Naylor wrote:
> > Were you considering adding the new function to simd.h now that that's
> > committed? It's a bit up in the air what should go in there, but this
new
> > function is low-level and generic enough to be a candidate...
>
> I don't have a strong opinion. I went with a separate file because I
> envisioned a variety of possible linear search functions (e.g., char,
> uint16, uint32), and some might not use SIMD instructions. Futhermore, it
> seemed less obvious to look in simd.h for linear search functions.

That is a good point. Maybe potential helpers in simd.h should only deal
specifically with vector registers, with it's users providing C fallbacks.
I don't have any good ideas of where to put the new function, though.

> > I wonder if the "pg_" prefix is appropriate here, as that is most often
> > used for things that hide specific details *and* where the base name
would
> > clash, like OS calls or C library functions. I'm not quite sure where
the
> > line is drawn, but I mean that "linearsearch" is a generic algorithm and
> > not a specific API we are implementing, if that makes sense.
>
> Yeah, I was concerned about clashing with lsearch() and lfind(). I will
> drop the prefix.

Hmm, I didn't know about those. lfind() is similar enough that it would
make sense to have pg_lfind32() etc in src/include/port/pg_lsearch.h, at
least for the v4 version that returns the pointer. We already declare
bsearch_arg() in src/include/port.h and that's another kind of array
search. Returning bool is different enough to have a different name.
pg_lfind32_ispresent()? *_noptr()? Meh.

Having said all that, the man page under BUGS [1] says "The naming is
unfortunate."

> > Out of curiosity do we know how much we get by loading four registers
> > rather than two?
>
> The small program I've been using for testing takes about 40% more time
> with the two register approach. The majority of this test involves
> searching for elements that either don't exist in the array or that live
> near the end of the array, so this is probably close to the worst case.

Ok, sounds good.

[1] https://man7.org/linux/man-pages/man3/lsearch.3.html#BUGS

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-08-05 04:05:20 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Peter Geoghegan 2022-08-05 03:55:33 Re: BTMaxItemSize seems to be subtly incorrect