From: | Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> |
---|---|
To: | pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Sort optimizations: Making in-memory sort cache-aware |
Date: | 2023-02-11 12:19:02 |
Message-ID: | 444ccfcc-06d6-0160-f662-9fa2075229ad@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
While working on sort optimization for window function, it was seen that
performance of sort where
all tuples are in memory was bad when number of tuples were very large [1]
Eg: work_mem = 4 GB, sort on 4 int columns on table having 10 million
tuples.
Issues we saw were as follows:
1. The comparetup function re-compares the first key again in case of
tie-break.
2. Frequent cache misses
Issue #1 is being looked in separate patch. I am currently looking at #2.
Possible solution was to batch tuples into groups (which can fit into L3
cache) before pushing them to sort function.
After looking at different papers on this (multi-Quicksort, memory-tuned
quicksort, Samplesort and various distributed sorts),
although they look promising (especially samplesort), I would like to
get more inputs as changes look bit too steep and
may or may not be in of scope of solving actual problem in hand.
Please let me know your opinions, do we really need to re-look at
quicksort for this use-case or we can
perform optimization without major change in core sorting algorithm? Are
we are open for trying new algorithms for sort?
Any suggestions to narrow down search space for this problem are welcomed.
Thanks,
Ankit
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-02-11 15:50:39 | Re: UUID v7 |
Previous Message | Dmitry Dolgov | 2023-02-11 12:08:20 | Re: pg_stat_statements and "IN" conditions |