Quick Links

Re: optimize lookups in snapshot [sub]xip arrays

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	John Naylor <jcnaylor(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: optimize lookups in snapshot [sub]xip arrays
Date:	2022-07-25 19:04:19
Message-ID:	20220725190419.GA4095199@nathanxps13
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jul 16, 2022 at 08:59:57PM -0700, Nathan Bossart wrote:
> On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote:
>> I wonder if we additionally / alternatively could use a faster method of
>> searching the array linearly, e.g. using SIMD.
>
> I looked into using SIMD. The patch is attached, but it is only intended
> for benchmarking purposes and isn't anywhere close to being worth serious
> review. There may be a simpler/better way to implement the linear search,
> but this seemed to work well. Overall, SIMD provided a decent improvement.
> I had to increase the number of writers quite a bit in order to demonstrate
> where the hash tables began winning. Here are the numbers:
>
> writers head simd hash
> 256 663 632 694
> 512 530 618 637
> 768 489 544 573
> 1024 364 508 562
> 2048 185 306 485
> 4096 146 197 441
>
> While it is unsurprising that the hash tables perform the best, there are a
> couple of advantages to SIMD that might make that approach worth
> considering. For one, there's really no overhead (i.e., you don't need to
> sort the array or build a hash table), so we can avoid picking an arbitrary
> threshold and just have one code path. Also, a SIMD implementation for a
> linear search through an array of integers could likely be easily reused
> elsewhere.

From the discussion thus far, it seems there is interest in optimizing
[sub]xip lookups, so I'd like to spend some time moving it forward. I
think the biggest open question is which approach to take. Both the SIMD
and hash table approaches seem viable, but I think I prefer the SIMD
approach at the moment (see the last paragraph of quoted text for the
reasons). What do folks think?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Re: optimize lookups in snapshot [sub]xip arrays at 2022-07-17 03:59:57 from Nathan Bossart

Responses

Re: optimize lookups in snapshot [sub]xip arrays at 2022-07-26 01:50:07 from Zhang Mingli
Re: optimize lookups in snapshot [sub]xip arrays at 2022-07-26 18:19:06 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Anthony Sotolongo	2022-07-25 19:19:22	Re: Expose Parallelism counters planned/execute in pg_stat_statements
Previous Message	Andrew Dunstan	2022-07-25 18:27:22	Re: very long record lines in expanded psql output