From: | "R, Rakshit" <rakshit(dot)r(at)intel(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, "Giacchino, Luca" <luca(dot)giacchino(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "Paul, Sourav Kumar" <sourav(dot)kumar(dot)paul(at)intel(dot)com> |
Subject: | RE: SIMD optimization for list_sort |
Date: | 2025-02-14 19:15:45 |
Message-ID: | CO1PR11MB506056215E799493EC5F42E8EDFE2@CO1PR11MB5060.namprd11.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi All,
Thank you very much for your feedback! We investigated the recommendations as we developed the current patch. Please refer to the comments below.
> I'd suggest targeting pg_qsort() directly, instead of list_sort(). list_sort() is not used in very performance critical parts.
Using x86-simd-sort would require a new pg_qsort variant taking an extractor function. The new API would still have to bubble up to functions like list_sort, which internally call pg_qsort. We targeted list_sort first, as we were aware of at least one use case that benefits from SIMD sorting (pgvector). As we target more use cases (such as tuple sort), we may need to move the SIMD-enabled implementation to a common location, and a new variant of pg_qsort may be a good candidate.
> I'd suggest proposing this to the pgvector project directly, since pgvector would immediately benefit.
Yes, we’ll share the performance gains on HNSW index build time with the pgvector community. However, other users of list_sort (e.g., other extensions) may benefit from the simd-sort version of list_sort as well. As it is not a pgvector-specific optimization, we propose it for PostgreSQL.
> If you could use this to speed up tuple sorting, that would be much more interesting for PostgreSQL itself.
Enabling x86-simd-sort for tuple sorting is in development. We observed promising results, but there is still work to do (e.g., to ensure there are no regressions in different conditions). We are planning to share another patch for tuple sort building on this one.
> I continually see performance reports when sorting and group order impact performance. Because of that, efforts have been made to optimise sorts with recombination of clause order. It would be nice to review your Sort operator optimisation. I hope, it might improve performance of the cases provided in these threads.
The submitted patch gives more details on the optimization.
Based on the previously shared proposal, we’re providing a patch for enabling SIMD-based sort for list_sort using the x86-simd-sort library (https://github.com/intel/x86-simd-sort)
As described earlier, we add a new function that takes an extractor function.
void list_sort_simd_float(List *list, list_sort_extractor_float extract, list_sort_comparator cmp);
The extract function retrieves the float values based on which the ListCells are sorted by x86-simd-sort. The comparator function is used to break ties. To break the ties, we call the existing list_sort function, passing the comparator.
The dependency on x86-simd-sort is optional through a --with-simdsort flag. If Postgres is not built with x86-simd-sort, the above function falls back to the existing list_sort.
The current patch provides a SIMD implementation for sorting float values. Support for additional data types (such as int32) will be added in the future.
Note that there are a few unexpected changes in the generated configure file. We will investigate.
An extension is also provided to stress test list_sort (under /src/test/modules/test_listsort). The extension defines a “test_listsort” function that takes 4 parameters. The first is the size of the list to be generated and sorted. The second and the third parameters (“random_number” and “tie_break_limit”) are used to introduce repeats (ties) to better simulate real-world data. The fourth parameter is a boolean flag to select the SIMD-enabled version or the existing list_sort.
To measure performance, we use pgbench to run the function repeatedly. In the first test below, a list of size 100K is generated. To populate each cell, a random float f is generated. If (f % 500) <= 100 (1/5 of the cells), the cell is assigned a number between 0 and 100 (which will cause repeats). The rest of the cells are assigned the original random float (for which repeats are unlikely). In the second example, tie_break_limit is increased to generate more repeats.
The data was collected on an AWS m7i.metal-24xl instance (Ubuntu 22.04, gcc12.3)
tps gains with SIMD-enabled list_sort over existing list_sort:
size=100k, max_random=500, tie_break_limit=100: 2.63x (1 client), 2.15x (48 clients)
size=100k, max_random=500, tie_break_limit=250: 2.09x (1 client), 1.75x (48 clients)
The gains are very promising, and we will characterize more conditions. Please let us know your thoughts.
Thanks!
Best regards,
Rakshit Rajeev
Best regards,
-----Original Message-----
From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Sent: Saturday, November 23, 2024 8:50 AM
To: Giacchino, Luca <luca(dot)giacchino(at)intel(dot)com>; pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: R, Rakshit <rakshit(dot)r(at)intel(dot)com>; Shankaran, Akash <akash(dot)shankaran(at)intel(dot)com>; Devulapalli, Raghuveer <raghuveer(dot)devulapalli(at)intel(dot)com>
Subject: Re: SIMD optimization for list_sort
On 22/11/2024 06:27, Giacchino, Luca wrote:
> We’d appreciate feedback on this approach. In the meantime, we will
> complete the patch to share. We also plan to extend SIMD-based sort to
> tuple sort in the future.
Nice! I continually see performance reports when sorting and group order impact performance. Because of that, efforts have been made to optimise sorts with recombination of clause order [1,2]. It would be nice to review your Sort operator optimisation. I hope, it might improve performance of the cases provided in these threads.
[1] POC: GROUP BY optimization
https://www.postgresql.org/message-id/flat/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
[2] Consider the number of columns in the sort cost model https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
--
regards, Andrei Lepikhov
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-list_sort-implementation-based-on-x86-simd-so.patch | application/octet-stream | 49.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-02-14 19:50:33 | Re: Parameter binding for COPY TO queries |
Previous Message | Andres Freund | 2025-02-14 19:13:04 | Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations |