Quick Links

GSOC 2018 Project - A New Sorting Routine

From:	Kefan Yang <starordust(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	GSOC 2018 Project - A New Sorting Routine
Date:	2018-07-10 21:02:02
Message-ID:	CAF448YLZz5cD+w5qYQGMcfFPY9zAu2mkkiCtOPSBJmwbjuPNig@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello, Hackers!

I am working on my project in Google Summer of Code 2018
<https://wiki.postgresql.org/wiki/GSoC_2018#Sorting_algorithms_benchmark_and_implementation_.282018.29>.
In this project, I am trying to improve the in-memory sorting routine in
PostgreSQL. Now I am very excited to share my progress with you guys.

Originally, PostgreSQL is using the QuickSort implemented by J. L. Bentley
and M. D. McIlroy in "Engineering a sort function" with some modifications.
This sorting routine is very fast, yet may fall to O(n^2) time complexity
in the worst case scenario. We are trying to find faster sorting algorithms
with guaranteed O(nlogn) time complexity.

In this patch, I

1. Use IntroSort to implement pg_qsort. IntroSort is a hybrid sorting
algorithm. It uses Quicksort most of the time, but switch to insertion sort
when the array is small and heapsort when the recursion exceeds depth limit.
2. Only check if the array is preordered once on the whole array to get
better overall performance. Previously the sorting routine checks if the
array is preordered on every recursion.

After some performance test, I find the new sorting routine

1. Slightly faster on sorting random arrays.
2. Much faster on worst case scenario since it has O(nlogn) worst case
complexity.
3. Has nearly the same performance on mostly sorted arrays.

I use both standalone tests and pgbench to show the result. A more detailed
report is in the attachment, along with the patch and some scripts to
reproduce the result.

*Notice:* this patch may fail a test case in ‘make check’ because QuickSort
and HeapSort are unstable. The two sorting routines may give different
results if multiple entries in the array have the same key value.

Generally speaking, I think the new sorting routine seems to be an
improvement in many ways. Please let me know if you have any thoughts or
suggestions.

Regards,

Kefan Yang

Attachment	Content-Type	Size
NewSorting.zip	application/x-zip-compressed	1.4 MB

Responses

Re: GSOC 2018 Project - A New Sorting Routine at 2018-07-13 00:50:52 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2018-07-10 21:25:28	Re: Optimze usage of immutable functions as relation
Previous Message	Tomas Vondra	2018-07-10 20:56:48	Re: cost_sort() improvements