Re: Sort optimizations: Making in-memory sort cache-aware

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Sort optimizations: Making in-memory sort cache-aware
Date: 2023-03-02 18:32:21
Message-ID: 15cd9c50-eb5a-2fad-1d01-1c93b836f2c0@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

I took a stab at naive version of this but been stuck for sometime now.

I have added logic to sort on first column at first pass,

realloc all tuples and do full sort at second pass, but I am not seeing

any benefit (it is actually regressing) at all.

Tried doing above both at bulk and at chunks of data.

> You effectively end up with a bounded number of pre-sorted blocks, so
the most
> obvious thing to try is to build a heap of those blocks and
effectively do a
> heapsort between the presorted blocks.

I am not very clear about implementation for this. How can we do
heapsort between

 the presorted blocks? Do we keep changing state->bound=i, i+n, i+2n
something like

this and keep calling make_bounded_heap/sort_bounded_heap?

> A related, but separate, improvement is to reduce / remove the memory
> allocation overhead.

This is still pending from my side.

I have attached some benchmarking results with script and POC

patches (which includes GUC switch to enable optimization for ease of
testing) for the same.

Tested on WORK_MEM=3 GB for 1 and 10 Million rows data.

Please let me know things which I can fix and re-attempt.

Thanks,

Ankit

Attachment Content-Type Size
image/png 14.1 KB
image/png 13.7 KB
image/png 15.0 KB
image/png 15.6 KB
bench_sort_optimization.sh.txt text/plain 1.2 KB
inmem-sort-opt-batch.patch text/x-patch 4.3 KB
inmem-sort-opt.patch text/x-patch 4.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-02 18:35:27 Re: Add SHELL_EXIT_CODE to psql
Previous Message Tomas Vondra 2023-03-02 18:27:00 Re: Add LZ4 compression in pg_dump