Quick Links

Re: optimize lookups in snapshot [sub]xip arrays

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, John Naylor <jcnaylor(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: optimize lookups in snapshot [sub]xip arrays
Date:	2022-07-15 20:08:57
Message-ID:	20220715200857.nm2gc4weixwlwbnx@awork3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Sounds worth pursuing.

On 2022-07-13 10:09:50 -0700, Nathan Bossart wrote:
> The attached patch has some key differences from the previous proposal.
> For example, the new patch uses simplehash instead of open-coding a new
> hash table.

> The attached patch waits until a lookup of [sub]xip before generating the
> hash table, so we only need to allocate enough space for the current
> elements in the [sub]xip array, and we avoid allocating extra memory for
> workloads that do not need the hash tables.

Hm. Are there any contexts where we'd not want the potential for failing due
to OOM?

I wonder if we additionally / alternatively could use a faster method of
searching the array linearly, e.g. using SIMD.

Another thing that might be worth looking into is to sort the xip/subxip
arrays into a binary-search optimized layout. That'd make the binary search
faster, wouldn't require additional memory (a boolean indicating whether
sorted somewhere, I guess), and would easily persist across copies of the
snapshot.

> I'm slightly worried about increasing the number of memory
> allocations in this code path, but the results above seemed encouraging on
> that front.

ISMT that the test wouldn't be likely to show those issues.

> These hash tables are regarded as ephemeral; they only live in
> process-local memory and are never rewritten, copied, or
> serialized.

What does rewriting refer to here?

Not convinced that's the right idea in case of copying. I think we often end
up copying snapshots frequently, and building & allocating the hashed xids
separately every time seems not great.

> + snapshot->xiph = NULL;
> + snapshot->subxiph = NULL;

Do we need separate hashes for these? ISTM that if we're overflowed then we
don't need ->subxip[h], and if not, then the action for an xid being in ->xip
and ->subxiph is the same?

Greetings,

Andres Freund

In response to

optimize lookups in snapshot [sub]xip arrays at 2022-07-13 17:09:50 from Nathan Bossart

Responses

Re: optimize lookups in snapshot [sub]xip arrays at 2022-07-17 03:59:57 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2022-07-15 20:20:59	Re: [PATCH] Log details for client certificate failures
Previous Message	David G. Johnston	2022-07-15 19:59:34	Re: MERGE and parsing with prepared statements