Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-08-26 21:18:21
Message-ID: d079c042180a698179384d6a7a3109cd8310e285.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Please find the v22 patches with the above changes.
>
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.

Thank you.

0001 API design:

* Remove TableModifyState.modify_end_callback.

* This patch means that we will either remove or deprecate
TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
strong opinions about maintaining support for multi-insert, or should
we just remove it outright and force any new AMs to implement the new
APIs to maintain COPY performance?

* Why do we need a separate "modify_flags" and "options"? Can't we just
combine them into TABLE_MODIFY_* flags?

Alexander, you had some work in this area as well, such b1484a3f19. I
believe 0001 covers this use case in a different way: rather than
giving complete responsibility to the AM to insert into the indexes,
the caller provides a callback and the AM is responsible for calling it
at the time the tuples are flushed. Is that right?

The design has been out for a while, so unless others have suggestions,
I'm considering the major design points mostly settled and I will move
forward with something like 0001 (pending implementation issues).

Note: I believe this API will extend naturally to updates and deletes,
as well.

0001 implementation issues:

* We need default implementations for AMs that don't implement the new
APIs, so that the AM will still function even if it only defines the
single-tuple APIs. If we need to make use of the AM's multi_insert
method (I'm not sure we do), then the default methods would need to
handle that as well. (I thought a previous version had these default
implementations -- is there a reason they were removed?)

* I am confused about how the heap implementation manages state and
resets it. mistate->mem_cxt is initialized to a new memory context in
heap_modify_begin, and then re-initialized to another new memory
context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also
used as a temp context for executing heap_multi_insert, and it gets
reset before calling the flush callback, which still needs the slots.

* Why materialize the slot at copyfrom.c:1308 if the slot is going to
be copied anyway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?

* After correcting the memory issues, can you get updated performance
numbers for COPY?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-26 21:27:15 Re: Fix memory counter update in reorderbuffer
Previous Message Thomas Munro 2024-08-26 21:06:29 Re: Segfault in jit tuple deforming on arm64 due to LLVM issue