Re: Use generation context to speed up tuplesorts

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2022-02-06 21:41:25
Message-ID: CAKU4AWq+KRmH9Cb8bsO-+usjxZkn8JsDRGU=fxy8YMFwdD1wOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi:

On Thu, Jan 20, 2022 at 9:31 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

>
> As of now, I still believe we'll need Tomas' patches to allow the
> block size to grow up to a maximum size. I think those patches are
> likely needed before we think about making tuplesort use generation
> contexts. The reason I believe this is that we don't always have good
> estimates for the number of tuples we're going to sort.

+1

I spent some times to study the case here and my current thought is:
we can discuss/commit the minimum committable changes which
should be beneficial for some cases and no harm for others.
Tomas's patch 0002 would make there are no more blocks needed
if we switch to GenerationContext (compared with Standard Context). and
David's patch can obviously reduce total memory usage and improve
the performance. so IMO Tomas's patch 0002 + David's patch is a committable
patchset at first round. and Tomas's 0001 patch would be good to have
as well.

I double checked Tomas's 0002 patch, it looks good to me. and then applied
David's patch with ALLOCSET_DEFAULT_SIZES, testing the same workload.
Here is the result (number is tps):

work_mem = '4GB'

| Test Case | master | patched |
|-----------+--------+---------|
| Test 1 | 306 | 406 |
| Test 2 | 225 | 278 |
| Test 3 | 202 | 248 |
| Test 4 | 184 | 218 |
| Test 5 | 281 | 360 |

work_mem = '4MB'

| Test Case | master | patched |
|-----------+--------+---------|
| Test 1 | 124 | 409 |
| Test 2 | 106 | 280 |
| Test 3 | 100 | 249 |
| Test 4 | 97 | 218 |
| Test 5 | 120 | 369 |

I didn't get the performance improvement as much as David's at the
beginning, I
think that is because David uses the ALLOCSET_DEFAULT_MAXSIZE directly which
will need less number of times for memory allocation.

AFAICS, Tomas's patch 0002 + David's patch should be ready for commit for
round 1.
We can try other opportunities like use rows estimation to allocate initial
memory and
GenerationContext improves like 0003/0004. Would this work?

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2022-02-06 22:30:34 Re: Release notes for February minor releases
Previous Message Andrew Dunstan 2022-02-06 20:57:03 Re: [RFC] building postgres with meson