pgsql: Fix performance regression in tuplesort specializations

From: David Rowley <drowley(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Fix performance regression in tuplesort specializations
Date: 2022-04-22 04:02:42
Message-ID: E1nhkVG-000Wuj-D6@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix performance regression in tuplesort specializations

697492434 added 3 new qsort specialization functions aimed to improve the
performance of sorting many of the common pass-by-value data types when
they're the leading or only sort key.

Unfortunately, that has caused a performance regression when sorting
datasets where many of the values being compared were equal. What was
happening here was that we were falling back to the standard sort
comparison function to handle tiebreaks. When the two given Datums
compared equally we would incur both the overhead of an indirect function
call to the standard comparer to perform the tiebreak and also the
standard comparer function would go and compare the leading key needlessly
all over again.

Here improve the situation in the 3 new comparison functions. We now
return 0 directly when the two Datums compare equally and we're performing
a 1-key sort.

Here we don't do anything to help the multi-key sort case where the
leading key uses one of the sort specializations functions. On testing
this case, even when the leading key's values are all equal, there
appeared to be no performance regression. Let's leave it up to future
work to optimize that case so that the tiebreak function no longer
re-compares the leading key over again.

Another possible fix for this would have been to add 3 additional sort
specialization functions to handle single-key sorts for these
pass-by-value types. The reason we didn't do that here is that we may
deem some other sort specialization to be more useful than single-key
sorts. It may be impractical to have sort specialization functions for
every single combination of what may be useful and it was already decided
that further analysis into which ones are the most useful would be delayed
until the v16 cycle. Let's not let this regression force our hand into
trying to make that decision for v15.

Author: David Rowley
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CA+hUKGJRbzaAOUtBUcjF5hLtaSHnJUqXmtiaLEoi53zeWSizeA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/99c754129d787ea4ce3b34b9f4c5f5e74c45ab6a

Modified Files
--------------
src/backend/utils/sort/tuplesort.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2022-04-22 09:25:17 pgsql: doc: Add links to tables
Previous Message Tom Lane 2022-04-21 21:59:04 pgsql: Remove inadequate assertion check in CTE inlining.