Quick Links

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-06-05 07:12:17
Message-ID:	CALj2ACX90L5Mb5Vv=jsvhOdZ8BVsfpZf-CdCGhtm2N+bGUCSjg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Thu, May 16, 2024 at 5:01 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> The flushing behavior is entirely controlled by the table AM. The heap
> can use the same flushing logic that it did before, which is to hold
> 1000 tuples.
>
> I like that it's accounting for memory, too, but it doesn't need to be
> overly restrictive. Why not just use work_mem? That should hold 1000
> reasonably-sized tuples, plus overhead.
>
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

I disagree with inventing more GUCs. Instead, I'd vote for just
holding 1000 tuples in buffers for heap AM. This not only keeps the
code and new table AM simple, but also does not cause regression for
COPY. In my testing, 1000 tuples with 1 int and 1 float columns took
40000 bytes of memory (40 bytes each tuple), whereas with 1 int, 1
float and 1 text columns took 172000 bytes of memory (172 bytes each
tuple) bytes which IMO mustn't be a big problem. Thoughts?

> > 1. Try to get the actual tuple sizes excluding header sizes for each
> > column in the new TAM.
>
> I don't see the point in arbitrarily excluding the header.
>
> > v21 also adds code to maintain tuple size for virtual tuple slots.
> > This helps make better memory-based flushing decisions in the new
> > TAM.
>
> That seems wrong. We shouldn't need to change the TupleTableSlot
> structure for this patch.

I dropped these ideas as I went ahead with the above idea of just
holding 1000 tuples in buffers for heap AM.

> Comments on v21:
>
> * All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

Previously, the multi insert state was initialized in modify_begin, so
it was then required to differentiate the code path. But, it's not
needed anymore with the lazy initialization of the multi insert state
moved to modify_buffer_insert. I removed it.

> * The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
> ExecInsert(). What's the disadvantage to using a bulk insert state
> there?

The subsequent read queries will not find the just-now-inserted tuples
in shared buffers as a separate ring buffer is used with bulk insert
access strategy. The multi inserts is nothing but buffering multiple
tuples plus inserting in bulk. So using the bulk insert strategy might
be worth it for INSERT INTO SELECTs too. Thoughts?

> * I'm a bit confused by TableModifyState->modify_end_callback. The AM
> both sets the callback and calls the callback -- why can't the code
> just go into the table_modify_end method?

I came up with modify_end_callback as per the discussion upthread to
use modify_begin, modify_end in future for UPDATE, DELETE and MERGE,
and not use any operation specific flags to clean the state
appropriately. The operation specific state cleaning logic can go to
the modify_end_callback implementation defined by the AM.

> * The code structure in table_modify_begin() (and related) is strange.
> Can it be simplified or am I missing something?

I previously defined these new table AMs as optional, check
GetTableAmRoutine(). And, there was a point upthread to provide
default/fallback implementation to help not fail insert operations on
tables without the new table AMs implemented. FWIW, the default
implementation was just doing the single inserts. The
table_modify_begin and friends need the logic to fallback making the
code there look different than other AMs. However, I now have a
feeling to drop the idea of having fallback implementation and let the
AMs deal with it. Although it might create some friction with various
non-core AM implementations, it keeps this patch simple which I would
vote for. Thoughts?

> * Why are table_modify_state and insert_modify_buffer_flush_context
> globals? What if there are multiple modify nodes in a plan?

Can you please provide the case that can generate multiple "modify
nodes" in a single plan? AFAICS, multiple "modify nodes" in a plan can
exist for both partitioned tables and tables that get created as part
of CTEs. I disabled multi inserts for both of these cases. The way I
disabled for CTEs looks pretty naive - I just did the following. Any
better suggestions here to deal with all such cases?

+ if (operation == CMD_INSERT &&
+ nodeTag(subplanstate) == T_SeqScanState)
+ canMultiInsert = true;

> * Can you explain the design in logical rep?

Multi inserts for logical replication work at the table level. In
other words, all tuple inserts related to a single table within a
transaction are buffered and written to the corresponding table when
necessary. Whenever inserts pertaining to another table arrive, the
buffered tuples related to the previous table are written to the table
before starting the buffering for the new table. Also, the tuples are
written to the table from the buffer when there arrives a non-INSERT
operation, for example, UPDATE/DELETE/TRUNCATE/COMMIT etc. FWIW,
pglogical has the similar multi inserts logic -
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L879.

Please find the v22 patches with the above changes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v22-0001-Introduce-new-Table-AM-for-multi-inserts.patch	application/x-patch	15.2 KB
v22-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch	application/x-patch	7.4 KB
v22-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch	application/x-patch	9.6 KB
v22-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch	application/x-patch	20.1 KB
v22-0005-Use-new-multi-insert-table-AM-for-COPY.patch	application/x-patch	14.0 KB

In response to

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM at 2024-05-15 23:31:42 from Jeff Davis

Responses

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM at 2024-08-26 05:39:28 from Bharath Rupireddy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2024-06-05 07:21:22	Re: Logical Replication of sequences
Previous Message	Wolfgang Walther	2024-06-05 07:08:28	Re: Build with LTO / -flto on macOS