Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-08-29 19:29:43
Message-ID: d1217312e89c7f59489f90613fcbc9e36ac4afbf.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote:
> IMO, every caller branching out in the code like if (rel->rd_tableam-
> >
> tuple_modify_buffer_insert != NULL) then multi insert; else single
> insert; doesn't look good. IMO, the default implementation approach
> keeps things simple which eventually can be removed in *near* future.
> Thoughts?

I believe we need the branching in the caller anyway:

1. If there is a BEFORE row trigger with a volatile function, the
visibility rules[1] mean that the function should see changes from all
the rows inserted so far this command, which won't work if they are
still in the buffer.

2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
that the function should see all previous rows inserted.

3. If there are volatile functions in the target list or WHERE clause,
the same visibility semantics apply.

4. If there's a "RETURNING ctid" clause, we need to either come up with
a way to return the tuples after flushing, or we need to use the
single-tuple path. (Similarly in the future when we support UPDATE ...
RETURNING, as Matthias pointed out.)

If we need two paths in each caller anyway, it seems cleaner to just
wrap the check for tuple_modify_buffer_insert in
table_modify_buffer_enabled().

We could perhaps use a one path and then force a batch size of one or
something, which is an alternative, but we have to be careful not to
introduce a regression (and it still requires a solution for #4).

Regards,
Jeff Davis

[1]
https://www.postgresql.org/docs/devel/trigger-datachanges.html

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Lelarge 2024-08-29 20:08:23 Add parallel columns for pg_stat_statements
Previous Message John H 2024-08-29 19:26:04 Re: Allow logical failover slots to wait on synchronous replication