From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | "Giacchino, Luca" <luca(dot)giacchino(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | "R, Rakshit" <rakshit(dot)r(at)intel(dot)com>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
Subject: | Re: SIMD optimization for list_sort |
Date: | 2024-11-22 16:01:22 |
Message-ID: | a0a02eba-ff3a-48e8-bc18-c8dbd4521e0e@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 22/11/2024 01:27, Giacchino, Luca wrote:
> The existing list_sort takes a comparator function to compare pairs of
> ListCell. On the other hand, x86-simd-sort requires an array of numeric
> values to sort, and it returns an array of sorted indices. To enable
> x86-simd-sort, we add new list_sort_simd functions that take an
> extractor function. The function extracts a value (float or uint32) from
> a ListCell. Those values are then collected into an array for x86-simd-
> sort to work on. A comparator function can still be passed to be used as
> tie-breaker.
>
> typedef float (*list_sort_extractor_float)(const ListCell *a);
>
> typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a);
>
> void list_sort_simd_float(List *list, list_sort_extractor_float extract,
> list_sort_comparator cmp);
>
> void list_sort_simd_uint32(List *list, list_sort_extractor_uint32
> extract, list_sort_comparator cmp);
>
> These functions will exist alongside the current list_sort. Existing
> list_sort use cases in Postgres or extensions will not be affected by
> default, and they can be converted to list_sort_simd functions where it
> makes sense in terms of performance.
I'd suggest targeting pg_qsort() directly, instead of list_sort().
list_sort() is not used in very performance critical parts.
> We identified a first use case for list_sort_simd_float in pgvector. As
> part of HNSW index construction, pgvector uses list_sort to sort
> candidate vectors by distance. Using list_sort_simd_float, we observed
> reduction in index build time in some scenarios. For example, we
> observed 7% reduction in index build time with the gist-960 dataset and
> 10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index
> with halfvec, m=80). We are also looking into microbenchmarks to measure
> list_sort performance independently.
That's interesting. I'd suggest proposing this to the pgvector project
directly, since pgvector would immediately benefit.
> We’d appreciate feedback on this approach. In the meantime, we will
> complete the patch to share. We also plan to extend SIMD-based sort to
> tuple sort in the future.
If you could use this to speed up tuple sorting, that would be much more
interesting for PostgreSQL itself.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-11-22 16:06:40 | Re: Offsets of `struct Port` are no longer constant |
Previous Message | Anthonin Bonnefoy | 2024-11-22 15:30:45 | Re: Consider pipeline implicit transaction as a transaction block |