Re: Use generation memory context for tuplestore.c

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use generation memory context for tuplestore.c
Date: 2024-05-10 16:36:14
Message-ID: CAEze2Wi-rXYha+0mU66Sn3-2FdZS7mgSJj=o_msPr95rHLAUaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 4 May 2024 at 04:02, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 4 May 2024 at 03:51, Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > Was a bump context considered? If so, why didn't it make the cut?
> > If tuplestore_trim is the only reason why the type of context in patch
> > 2 is a generation context, then couldn't we make the type of context
> > conditional on state->eflags & EXEC_FLAG_REWIND, and use a bump
> > context if we require rewind capabilities (i.e. where _trim is never
> > effectively executed)?
>
> I didn't really want to raise all this here, but to answer why I
> didn't use bump...
>
> There's a bit more that would need to be done to allow bump to work in
> use-cases where no trim support is needed. Namely, if you look at
> writetup_heap(), you'll see a heap_free_minimal_tuple(), which is
> pfreeing the memory that was allocated for the tuple in either
> tuplestore_puttupleslot(), tuplestore_puttuple() or
> tuplestore_putvalues(). So basically, what happens if we're still
> loading the tuplestore and we've spilled to disk, once the
> tuplestore_put* function is called, we allocate memory for the tuple
> that might get stored in RAM (we don't know yet), but then call
> tuplestore_puttuple_common() which decides if the tuple goes to RAM or
> disk, then because we're spilling to disk, the write function pfree's
> the memory we allocate in the tuplestore_put function after the tuple
> is safely written to the disk buffer.

Thanks, that's exactly the non-obvious issue I was looking for, but
couldn't find immediately.

> This is a fairly inefficient design. While, we do need to still form
> a tuple and store it somewhere for tuplestore_putvalues(), we don't
> need to do that for a heap tuple. I think it should be possible to
> write directly from the table's page.

[...]

> I'd rather tackle these problems independently and I believe there are
> much bigger wins to moving from aset to generation than generation to
> bump, so that's where I've started.

That's entirely reasonable, and I wouldn't ask otherwise.

Thanks,

Matthias van de Meent

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-05-10 16:50:54 Re: First draft of PG 17 release notes
Previous Message Dmitry Dolgov 2024-05-10 16:34:36 Re: Use generation memory context for tuplestore.c