From: | David Rowley <dgrowley(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Optimize WindowAgg's use of tuplestores |
Date: | 2024-07-07 10:57:17 |
Message-ID: | CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
As part of making tuplestores faster [1], I noticed that in WindowAgg, when
we end one partition we call tuplestore_end() and then we do
tuplestore_begin_heap() again for the next partition in begin_partition()
and then go on to set up the tuplestore read pointers according to what's
required for the given frameOptions of the WindowAgg. This might make
sense if the frameOptions could change between partitions, but they can't,
so I don't see any reason why we can't just do tuplestore_clear() at the
end of a partition. That resets the read pointer positions back to the
start again ready for the next partition.
I wrote the attached patch and checked how it affects performance. It helps
quite a bit when there are lots of partitions.
CREATE TABLE a (a INT NOT NULL);
INSERT INTO a SELECT x FROM generate_series(1,1000000)x;
VACUUM FREEZE ANALYZE a;
bench.sql:
SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000;
master:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 293.488 ms
latency average = 295.509 ms
latency average = 297.772 ms
patched:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 203.234 ms
latency average = 204.538 ms
latency average = 203.877 ms
About 45% faster.
Here's the top of perf top of each:
master:
8.61% libc.so.6 [.] _int_malloc
6.71% libc.so.6 [.] _int_free
6.42% postgres [.] heap_form_minimal_tuple
6.40% postgres [.] tuplestore_alloc_read_pointer
5.87% libc.so.6 [.] unlink_chunk.constprop.0
3.86% libc.so.6 [.] __memmove_avx512_unaligned_erms
3.80% postgres [.] AllocSetFree
3.51% postgres [.] spool_tuples
3.45% postgres [.] ExecWindowAgg
2.09% postgres [.] tuplesort_puttuple_common
1.92% postgres [.] comparetup_datum
1.88% postgres [.] AllocSetAlloc
1.85% postgres [.] tuplesort_heap_replace_top
1.84% postgres [.] ExecStoreBufferHeapTuple
1.69% postgres [.] window_gettupleslot
patched:
8.95% postgres [.] ExecWindowAgg
8.10% postgres [.] heap_form_minimal_tuple
5.16% postgres [.] spool_tuples
4.03% libc.so.6 [.] __memmove_avx512_unaligned_erms
4.02% postgres [.] begin_partition
3.11% postgres [.] tuplesort_puttuple_common
2.97% postgres [.] AllocSetAlloc
2.96% postgres [.] comparetup_datum
2.83% postgres [.] tuplesort_heap_replace_top
2.79% postgres [.] window_gettupleslot
2.22% postgres [.] AllocSetFree
2.11% postgres [.] MemoryContextReset
1.99% postgres [.] LogicalTapeWrite
1.98% postgres [.] advance_windowaggregate
Lots less malloc/free work going on.
I'm also tempted to do a cleanup of the state machine inside
nodeWindowAgg.c as I had to add another bool flag to WindowAggState to
replace the previous test that checked if the tuplestore was empty (i.e
just finished a partition) with if (winstate->buffer == NULL). I couldn't
do that anymore due to no longer nullifying that field. I think such a
cleanup would be a separate patch. Annoyingly the extra bool is the 9th
bool flag and widens that struct by 8 bytes and leaves a 7 byte hole.
David
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Optimize-WindowAgg-s-use-of-tuplestores.patch | application/octet-stream | 8.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2024-07-07 11:28:28 | Re: 010_pg_basebackup.pl vs multiple filesystems |
Previous Message | Andrew Dunstan | 2024-07-07 10:30:57 | Re: tests fail on windows with default git settings |