Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-08-26 21:59:28
Message-ID: CAEze2WgmyTVyZv+u+D44f+3Xk5TyZm=+4Gp5qdEX14FwBJA1Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 26 Aug 2024 at 23:18, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> > On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > Please find the v22 patches with the above changes.
> >
> > Please find the v23 patches after rebasing 0005 and adapting 0004 for
> > 9758174e2e.
>
>
> Thank you.
>
> 0001 API design:
>
> * Remove TableModifyState.modify_end_callback.
>
> * This patch means that we will either remove or deprecate
> TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
> strong opinions about maintaining support for multi-insert, or should
> we just remove it outright and force any new AMs to implement the new
> APIs to maintain COPY performance?

I don't think there is a significant enough difference in the
capabilities and requirements between the two APIs as currently
designed that removal of the old API would mean a significant
difference in capabilities. Maybe we could supply an equivalent API
shim to help the transition, but I don't think we should keep the old
API around in the TableAM.

> * Why do we need a separate "modify_flags" and "options"? Can't we just
> combine them into TABLE_MODIFY_* flags?
>
>
> Alexander, you had some work in this area as well, such b1484a3f19. I
> believe 0001 covers this use case in a different way: rather than
> giving complete responsibility to the AM to insert into the indexes,
> the caller provides a callback and the AM is responsible for calling it
> at the time the tuples are flushed. Is that right?
>
> The design has been out for a while, so unless others have suggestions,
> I'm considering the major design points mostly settled and I will move
> forward with something like 0001 (pending implementation issues).

Sorry about this late feedback, but while I'm generally +1 on the idea
and primary design, I feel that it doesn't quite cover all the areas
I'd expected it to cover.

Specifically, I'm having trouble seeing how this could be used to
implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
returning output path for the newly inserted tuples' data, which is
usually required for our execution nodes' output path. Is support for
RETURN-clauses planned for this API? In a previous iteration, the
flush operation was capable of returning a TTS, but that seems to have
been dropped, and I can't quite figure out why.

> Note: I believe this API will extend naturally to updates and deletes,
> as well.

I have the same concern about UPDATE ... RETURNING not fitting with
this callback-based design.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrian Klaver 2024-08-26 22:10:06 Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Previous Message nikhil raj 2024-08-26 21:49:04 Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.