Re: Insertion Sort Improvements

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insertion Sort Improvements
Date: 2023-05-24 02:10:00
Message-ID: CAFBsxsHUwd97Wxx+cmyAFZDj4fZpzJmBSyPkCCJFjd0fk1FvWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 23, 2023 at 4:10 PM Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com> wrote:
>
> Greetings,
>
> I would like to revisit the discussion and concur with John's perspective
that incremental progress through small, successive modifications is the
appropriate approach to move forward. Therefore, I would like to propose
two distinct ideas aimed at enhancing the functionality of insertion sort:
>
> 1. Implementation of a more efficient variant (as described in the
introductory part of this thread):

That's worth trying out. It might also then be worth trying to push both
unordered values -- the big one up / the small one down. I've seen other
implementations do that, but don't remember where, or what it's called.

> 2. It appears that there is a consensus against disregarding the
presorted check, despite its questionable value. In light of this, an
alternative suggestion is to integrate the presorted check into the
insertion sort path by utilizing an unbounded insertion sort.

"Unbounded" means no bounds check on the array. That's not possible in its
current form, so I think you misunderstood something.

> Only after a maximum number of swaps have occurred should we abandon the
sorting process.

I only remember implementations tracking loop iterations, not swaps. You'd
need evidence that this is better.

> If insertion sort is executed on the entire array without any swaps, we
can simply return. If not, and we continue with quicksort after the swap
limit has been reached, we at least have left the array in a more sorted
state, which may likely be advantageous for subsequent iterations.

An important part not mentioned yet: This might only be worth doing if the
previous recursion level performed no swaps during partitioning and the
current pivot candidates are ordered. That's a bit of work and might not be
convenient now -- it'd be trivial with dual-pivot, but I've not looked at
that in a while. (Fittingly, dual-pivot requires a higher insertion sort
threshold so it goes both ways.)

> I would greatly appreciate assistance in benchmarking these proposed
changes. Your collaboration in this matter would be invaluable.

I advise looking in the archives for scripts from previous benchmarks. No
need to reinvent the wheel -- it's enough work as it is. A key thing here
for #1 is that improving insertion sort requires increasing the threshold
to show the true improvement. It's currently 7, but should really be
something like 10. A script that repeats tests for, say, 7 through 18
should show a concave-up shape in the times. The bottom of the bowl should
shift to higher values, and that minimum is what should be compared.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-05-24 02:28:45 Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
Previous Message Stephen Frost 2023-05-24 02:01:03 Re: Docs: Encourage strong server verification with SCRAM