Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-04-24 12:49:40
Message-ID: CALj2ACX9vVYHYkX8e6w058EuAs8JL5EsnzadTxGhpiE_Ep_ByA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.

I added table_modify_begin, table_modify_buffer_insert,
table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
authors now can define their own buffering strategy and flushing decisions
based on their tuple storage kinds and various other AM specific factors. I
also added a default implementation that falls back to single inserts when
no implementation is provided for these AM by AM authors. See the attached
v19-0001 patch.

> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)

I added a flush callback named TableModifyBufferFlushCallback; when
provided by callers invoked after tuples are flushed to disk from the
buffers but before the AM frees them up. Index insertions and AFTER ROW
INSERT triggers can be executed in this callback. See the v19-0001 patch
for how AM invokes the flush callback, and see either v19-0003 or v19-0004
or v19-0005 for how a caller can supply the callback and required context
to execute index insertions and AR triggers.

> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).

I haven't thought about these things yet. I can only focus on them after
seeing how the attached patches go from here.

> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.

Heap implementation for thes new Table AMs uses a separate memory context
for all of the operations. Please have a look and let me know if we need
anything more.

> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.

I used multi inserts for all of these including TABLE REWRITE commands such
as ALTER TABLE. See the attached v19-0002 patch. Check the testing section
below for benefits.

FWIW, following are some of the TABLE REWRITE commands that can get
benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTER COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
and so on.

> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.

I modified the COPY FROM code to use the new Table AMs, and performed some
tests which show no signs of regression. Check the testing section below
for more details. See the attached v19-0005 patch. With this,
table_multi_insert can be deprecated.

> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.

I did that in v19-0003. I did place the table_modify_end call in multiple
places including ExecEndModifyTable. I didn't find any issues with it.
Please have a look and let me know if we need the end call in more places.
Check the testing section below for benefits.

> 10. Use these new methods for logical apply.

I used multi inserts for Logical Replication apply. in v19-0004. Check the
testing section below for benefits.

FWIW, open-source pglogical does have multi insert support, check code
around
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
.

> 11. Deprecate the multi_insert API.

I did remove both table_multi_insert and table_finish_bulk_insert in
v19-0006. Perhaps, removing them isn't a great idea, but adding a
deprecation WARNING/ERROR until some more PG releases might be worth
looking at.

> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

If the design, code and benefits that these new Table AMs bring to the
table look good, I hope to see it for PG 18.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The attached patches already have implemented these new Table AMs for Heap.
I don't think we need a separate implementation to demonstrate. If others
feel so, I'm open to thoughts here.

Having said above, I'd like to reiterate the motivation behind the new
Table AMs for multi and single inserts.

1. A scan-like API with state being carried across is thought to be better
as suggested by Andres Freund -
https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx@alap3.anarazel.de
.
2. Allowing a Table AM to optimize operations across multiple inserts,
define its own buffering strategy and take its own flushing decisions based
on their tuple storage kinds and various other AM specific factors.
3. Improve performance of various SQL commands with multi inserts for Heap
AM.

The attached v19 patches might need some more detailed comments, some
documentation and some specific tests ensuring the multi inserts for Heap
are kicked-in for various commands. I'm open to thoughts here.

I did some testing to see how various commands benefit with multi inserts
using these new Table AM for heap. It's not only the improvement in
performance these commands see, but also the amount of WAL that gets
generated reduces greatly. After all, multi inserts optimize the insertions
by writing less WAL. IOW, writing WAL record per page if multiple rows fit
into a single data page as opposed to WAL record per row.

Test case 1: 100 million rows, 2 columns (int and float)

Command | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS | 121 | 77 | 36.3 |
1.57
CREATE MATERIALIZED VIEW | 101 | 49 | 51.4 |
2.06
REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2 |
2.09
ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6 |
1.53
COPY FROM | 71 | 72 | 0 |
1
INSERT INTO ... SELECT | 117 | 62 | 47 |
1.88
LOGICAL REPLICATION APPLY | 393 | 306 | 22.1 |
1.28

Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS | 6.8 | 2.4 |
64.7 | 2.83
CREATE MATERIALIZED VIEW | 7.2 | 2.3 |
68 | 3.13
REFRESH MATERIALIZED VIEW | 10 | 5.1 |
49 | 1.96
ALTER TABLE (TABLE REWRITE) | 8 | 3.2 |
60 | 2.5
COPY FROM | 2.9 | 3 | 0
| 1
INSERT INTO ... SELECT | 8 | 3 |
62.5 | 2.66
LOGICAL REPLICATION APPLY | 7.5 | 2.3 |
69.3 | 3.26

Test case 2: 1 billion rows, 1 column (int)

Command | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS | 794 | 386 | 51.38 |
2.05
CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03 |
1.78
REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28 |
1.62
ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94 |
1.66
COPY FROM | 321 | 330 | -0.02 |
0.97
INSERT INTO ... SELECT | 1084 | 586 | 45.94 |
1.84
LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52 |
1.18

Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS | 60 | 12 |
80 | 5
CREATE MATERIALIZED VIEW | 60 | 12 |
80 | 5
REFRESH MATERIALIZED VIEW | 60 | 12 |
80 | 5
ALTER TABLE (TABLE REWRITE) | 123 | 31 |
60 | 2.5
COPY FROM | 12 | 12 | 0
| 1
INSERT INTO ... SELECT | 120 | 24 |
80 | 5
LOGICAL REPLICATION APPLY | 61 | 12 |
80.32 | 5

Test setup:
./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
install.log && make -j 8 install > install.log 2>&1 &

wal_level=logical
max_wal_size = 256GB
checkpoint_timeout = 1h

Test system is EC2 instance of type c5.4xlarge:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 7
BogoMIPS: 5999.99
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 8 MiB (8 instances)
L3: 35.8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
RAM:
MemTotal: 32036536 kB

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v19-0001-Introduce-new-Table-Access-Methods-for-single-an.patch application/x-patch 20.3 KB
v19-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch application/x-patch 7.0 KB
v19-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch application/x-patch 10.1 KB
v19-0004-Optimize-Logical-Replication-apply-with-multi-in.patch application/x-patch 19.9 KB
v19-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch application/x-patch 13.1 KB
v19-0006-Remove-table_multi_insert-and-table_finish_bulk_.patch application/x-patch 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-24 13:25:09 Re: Remove unnecessary code rom be_lo_put()
Previous Message Andrew Dunstan 2024-04-24 12:35:37 Re: WIP Incremental JSON Parser