pgsql: Improve memory management and performance of tuplestore.c

From: David Rowley <drowley(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Improve memory management and performance of tuplestore.c
Date: 2024-07-05 05:51:55
Message-ID: E1sPbrO-000NaK-6T@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Improve memory management and performance of tuplestore.c

Here we make tuplestore.c use a generation.c memory context rather than
allocating tuples into the CurrentMemoryContext, which primarily is the
ExecutorState or PortalHoldContext memory context. Not having a
dedicated context can cause the CurrentMemoryContext context to become
bloated when pfree'd chunks are not reused by future tuples. Using
generation speeds up users of tuplestore.c, such as the Materialize,
WindowAgg and CTE Scan executor nodes. The main reason for the speedup is
due to generation.c being more memory efficient than aset.c memory
contexts. Specifically, generation does not round sizes up to the next
power of 2 value. This both saves memory, allowing more tuples to fit in
work_mem, but also makes the memory usage more compact and fit on fewer
cachelines. One benchmark showed up to a 22% performance increase in a
query containing a Materialize node. Much higher gains are possible if
the memory reduction prevents tuplestore.c from spilling to disk. This is
especially true for WindowAgg nodes where improvements of several thousand
times are possible if the memory reductions made here prevent tuplestore
from spilling to disk.

Additionally, a generation.c memory context is much better suited for this
job as it works well with FIFO palloc/pfree patterns, which is exactly how
tuplestore.c uses it. Because of the way generation.c allocates memory,
tuples consecutively stored in tuplestores are much more likely to be
stored consecutively in memory. This allows the CPU's hardware prefetcher
to work more efficiently as it provides a more predictable pattern to
allow cachelines for the next tuple to be loaded from RAM in advance of
them being needed by the executor.

Using a dedicated memory context for storing tuples also allows us to more
efficiently clean up the memory used by the tuplestore as we can reset or
delete the context rather than looping over all stored tuples and
pfree'ing them one by one.

Also, remove a badly placed USEMEM call in readtup_heap(). The tuple
wasn't being allocated in the Tuplestorestate's context, so no need to
adjust the memory consumed by the tuplestore there.

Author: David Rowley
Reviewed-by: Matthias van de Meent, Dmitry Dolgov
Discussion: https://postgr.es/m/CAApHDvp5Py9g4Rjq7_inL3-MCK1Co2CRt_YWFwTU2zfQix0p4A@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/590b045c37aad44915f7f472343f24c2bafbe5d8

Modified Files
--------------
src/backend/utils/sort/tuplestore.c | 55 +++++++++++++++++++++++++++----------
1 file changed, 40 insertions(+), 15 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-07-05 08:33:33 pgsql: Lift limitation that PGPROC->links must be the first field
Previous Message David Rowley 2024-07-05 04:56:40 pgsql: Fix newly introduced issue in EXPLAIN for Materialize nodes