From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: PG15 beta1 sort performance regression due to Generation context change |
Date: | 2022-05-23 21:01:15 |
Message-ID: | 1bab4553-4c66-9e4e-4725-4593b0a87b5a@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
FYI not sure why, but your responses seem to break threading quite
often, due to missing headers identifying the message you're responding
to (In-Reply-To, References). Not sure why or how to fix it, but this
makes it much harder to follow the discussion.
On 5/22/22 21:11, Ranier Vilela wrote:
> Hi David,
>
>>Over the past few days I've been gathering some benchmark results
>>together to show the sort performance improvements in PG15 [1].
>
>>One of the test cases I did was to demonstrate Heikki's change to use
>>a k-way merge (65014000b).
>
>>The test I did to try this out was along the lines of:
>
>>set max_parallel_workers_per_gather = 0;
>>create table t (a bigint not null, b bigint not null, c bigint not
>>null, d bigint not null, e bigint not null, f bigint not null);
>
>>insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x;
> -- 10GB!
>>vacuum freeze t;
>
>>The query I ran was:
>
>>select * from t order by a offset 140247142;
>
> I redid this test here:
> Windows 10 64 bits
> msvc 2019 64 bits
> RAM 8GB
> SSD 256 GB
>
> HEAD (default configuration)
> Time: 229396,551 ms (03:49,397)
> PATCHED:
> Time: 220887,346 ms (03:40,887)
>
This is 10x longer than reported by David. Presumably David used a
machine a lot of RAM, while your system has 8GB and so is I/O bound.
Also, what exactly does "patched" mean? The patch you attached?
>>I tested various sizes of work_mem starting at 4MB and doubled that
>>all the way to 16GB. For many of the smaller values of work_mem the
>>performance is vastly improved by Heikki's change, however for
>>work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
>>seconds and PG15 beta 1 took 29 seconds!
>
>>I've been trying to get to the bottom of this today and finally have
>>discovered this is due to the tuple size allocations in the sort being
>>exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
>>to store tuples in sorts) the tuple for the sort would be stored in an
>>aset context. After 40af10b57 we'll use a generation context. The
>>idea with that change is that the generation context does no
>>power-of-2 round ups for allocations, so we save memory in most cases.
>>However, due to this particular test having a tuple size of 64-bytes,
>>there was no power-of-2 wastage with aset.
>
>>The problem is that generation chunks have a larger chunk header than
>>aset do due to having to store the block pointer that the chunk
>>belongs to so that GenerationFree() can increment the nfree chunks in
>>the block. aset.c does not require this as freed chunks just go onto a
>>freelist that's global to the entire context.
>
>>Basically, for my test query, the slowdown is because instead of being
>>able to store 620702 tuples per tape over 226 tapes with an aset
>>context, we can now only store 576845 tuples per tape resulting in
>>requiring 244 tapes when using the generation context.
>
>>If I had added column "g" to make the tuple size 72 bytes causing
>>aset's code to round allocations up to 128 bytes and generation.c to
>>maintain the 72 bytes then the sort would have stored 385805 tuples
>>over 364 batches for aset and 538761 tuples over 261 batches using the
>>generation context. That would have been a huge win.
>
>>So it basically looks like I discovered a very bad case that causes a
>>significant slowdown. Yet other cases that are not an exact power of
>>2 stand to gain significantly from this change.
>
>>One thing 40af10b57 does is stops those terrible performance jumps
>>when the tuple size crosses a power-of-2 boundary. The performance
>>should be more aligned to the size of the data being sorted now...
>>Unfortunately, that seems to mean regressions for large sorts with
>>power-of-2 sized tuples.
>
> It seems to me that the solution would be to use aset allocations
>
> when the size of the tuples is power-of-2?
>
> if (state->sortopt & TUPLESORT_ALLOWBOUNDED ||
> (state->memtupsize & (state->memtupsize - 1)) == 0)
> state->tuplecontext = AllocSetContextCreate(state->sortcontext,
> "Caller tuples", ALLOCSET_DEFAULT_SIZES);
> else
> state->tuplecontext = GenerationContextCreate(state->sortcontext,
> "Caller tuples", ALLOCSET_DEFAULT_SIZES);
>
I'm pretty sure this is pointless, because memtupsize is the size of the
memtuples array. But the issue is about size of the tuples. After all,
David was talking about 64B chunks, but the array is always at least
1024 elements, so it obviously can't be the same thing.
How would we even know how large the tuples will be at this point,
before we even see the first of them?
> I took a look and tried some improvements to see if I had a better result.
>
IMHO special-casing should be the last resort, because it makes the
behavior much harder to follow. Also, we're talking about sort, but
don't other places using Generation context have the same issue?
Treating prefferrable to find a fix addressing all those places,
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2022-05-23 21:20:36 | Re: PG15 beta1 sort performance regression due to Generation context change |
Previous Message | Tomas Vondra | 2022-05-23 20:52:10 | Re: PG15 beta1 sort performance regression due to Generation context change |