From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Subject: | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
Date: | 2024-04-24 16:07:03 |
Message-ID: | CAFj8pRBNrZv6m-BdnU9hkBQvufdQkKdM1SCmxbax3drX_jjdyw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> napsal:
> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > Here's where I think this API should go:
> >
> > 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> > those that are implemented in your patch.
>
> I added table_modify_begin, table_modify_buffer_insert,
> table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
> authors now can define their own buffering strategy and flushing decisions
> based on their tuple storage kinds and various other AM specific factors. I
> also added a default implementation that falls back to single inserts when
> no implementation is provided for these AM by AM authors. See the attached
> v19-0001 patch.
>
> > 2. Add some kind of flush callback that will be called either while the
> > tuples are being flushed or after the tuples are flushed (but before
> > they are freed by the AM). (Aside: do we need to call it while the
> > tuples are being flushed to get the right visibility semantics for
> > after-row triggers?)
>
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER ROW
> INSERT triggers can be executed in this callback. See the v19-0001 patch
> for how AM invokes the flush callback, and see either v19-0003 or v19-0004
> or v19-0005 for how a caller can supply the callback and required context
> to execute index insertions and AR triggers.
>
> > 3. Add table_modify_buffer_{update|delete} APIs.
> >
> > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> > the buffer_insert/update/delete APIs; we don't need a separate merge
> > method. This probably requires that the AM maintain 3 separate buffers
> > to distinguish different kinds of changes at flush time (obviously
> > these can be initialized lazily to avoid overhead when not being used).
>
> I haven't thought about these things yet. I can only focus on them after
> seeing how the attached patches go from here.
>
> > 4. Some kind of API tweaks to help manage memory when modifying
> > pertitioned tables, so that the buffering doesn't get out of control.
> > Perhaps just reporting memory usage and allowing the caller to force
> > flushes would be enough.
>
> Heap implementation for thes new Table AMs uses a separate memory context
> for all of the operations. Please have a look and let me know if we need
> anything more.
>
> > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> > fairly straightforward, I believe, and handled by your patch. Indexes
> > are (re)built afterward, and no triggers are possible.
> >
> > 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> > straightforward, I believe, and handled by your patch. No indexes or
> > triggers are possible.
>
> I used multi inserts for all of these including TABLE REWRITE commands
> such as ALTER TABLE. See the attached v19-0002 patch. Check the testing
> section below for benefits.
>
> FWIW, following are some of the TABLE REWRITE commands that can get
> benefitted:
>
> ALTER TABLE tbl ALTER c1 TYPE bigint;
> ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
> ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
> ALTER TABLE itest3 ALTER COLUMN a TYPE int;
> ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
> ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
> and so on.
>
> > 7. Use these new methods for COPY. We have to be careful to avoid
> > regressions for the heap method, because it's already managing its own
> > buffers. If the AM manages the buffering, then it may require
> > additional copying of slots, which could be a disadvantage. To solve
> > this, we may need some minor API tweaks to avoid copying when the
> > caller guarantees that the memory will not be freed to early, or
> > perhaps expose the AM's memory context to copyfrom.c. Another thing to
> > consider is that the buffering in copyfrom.c is also used for FDWs, so
> > that buffering code path needs to be preserved in copyfrom.c even if
> > not used for AMs.
>
> I modified the COPY FROM code to use the new Table AMs, and performed some
> tests which show no signs of regression. Check the testing section below
> for more details. See the attached v19-0005 patch. With this,
> table_multi_insert can be deprecated.
>
> > 8. Use these new methods for INSERT INTO ... SELECT. One potential
> > challenge here is that execution nodes are not always run to
> > completion, so we need to be sure that the flush isn't forgotten in
> > that case.
>
> I did that in v19-0003. I did place the table_modify_end call in multiple
> places including ExecEndModifyTable. I didn't find any issues with it.
> Please have a look and let me know if we need the end call in more places.
> Check the testing section below for benefits.
>
> > 10. Use these new methods for logical apply.
>
> I used multi inserts for Logical Replication apply. in v19-0004. Check the
> testing section below for benefits.
>
> FWIW, open-source pglogical does have multi insert support, check code
> around
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
> .
>
> > 11. Deprecate the multi_insert API.
>
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006. Perhaps, removing them isn't a great idea, but adding a
> deprecation WARNING/ERROR until some more PG releases might be worth
> looking at.
>
> > Thoughts on this plan? Does your patch make sense in v17 as a stepping
> > stone, or should we try to make all of these API changes together in
> > v18?
>
> If the design, code and benefits that these new Table AMs bring to the
> table look good, I hope to see it for PG 18.
>
> > Also, a sample AM code would be a huge benefit here. Writing a real AM
> > is hard, but perhaps we can at least have an example one to demonstrate
> > how to use these APIs?
>
> The attached patches already have implemented these new Table AMs for
> Heap. I don't think we need a separate implementation to demonstrate. If
> others feel so, I'm open to thoughts here.
>
> Having said above, I'd like to reiterate the motivation behind the new
> Table AMs for multi and single inserts.
>
> 1. A scan-like API with state being carried across is thought to be better
> as suggested by Andres Freund -
> https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx@alap3.anarazel.de
> .
> 2. Allowing a Table AM to optimize operations across multiple inserts,
> define its own buffering strategy and take its own flushing decisions based
> on their tuple storage kinds and various other AM specific factors.
> 3. Improve performance of various SQL commands with multi inserts for Heap
> AM.
>
> The attached v19 patches might need some more detailed comments, some
> documentation and some specific tests ensuring the multi inserts for Heap
> are kicked-in for various commands. I'm open to thoughts here.
>
> I did some testing to see how various commands benefit with multi inserts
> using these new Table AM for heap. It's not only the improvement in
> performance these commands see, but also the amount of WAL that gets
> generated reduces greatly. After all, multi inserts optimize the insertions
> by writing less WAL. IOW, writing WAL record per page if multiple rows fit
> into a single data page as opposed to WAL record per row.
>
> Test case 1: 100 million rows, 2 columns (int and float)
>
> Command | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS | 121 | 77 | 36.3
> | 1.57
> CREATE MATERIALIZED VIEW | 101 | 49 | 51.4
> | 2.06
> REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2
> | 2.09
> ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6
> | 1.53
> COPY FROM | 71 | 72 | 0
> | 1
> INSERT INTO ... SELECT | 117 | 62 | 47
> | 1.88
> LOGICAL REPLICATION APPLY | 393 | 306 | 22.1
> | 1.28
>
> Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS | 6.8 | 2.4 |
> 64.7 | 2.83
> CREATE MATERIALIZED VIEW | 7.2 | 2.3 |
> 68 | 3.13
> REFRESH MATERIALIZED VIEW | 10 | 5.1 |
> 49 | 1.96
> ALTER TABLE (TABLE REWRITE) | 8 | 3.2 |
> 60 | 2.5
> COPY FROM | 2.9 | 3 |
> 0 | 1
> INSERT INTO ... SELECT | 8 | 3 |
> 62.5 | 2.66
> LOGICAL REPLICATION APPLY | 7.5 | 2.3 |
> 69.3 | 3.26
>
> Test case 2: 1 billion rows, 1 column (int)
>
> Command | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS | 794 | 386 | 51.38
> | 2.05
> CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03
> | 1.78
> REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28
> | 1.62
> ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94
> | 1.66
> COPY FROM | 321 | 330 | -0.02
> | 0.97
> INSERT INTO ... SELECT | 1084 | 586 | 45.94
> | 1.84
> LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52
> | 1.18
>
> Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS | 60 | 12 |
> 80 | 5
> CREATE MATERIALIZED VIEW | 60 | 12 |
> 80 | 5
> REFRESH MATERIALIZED VIEW | 60 | 12 |
> 80 | 5
> ALTER TABLE (TABLE REWRITE) | 123 | 31 |
> 60 | 2.5
> COPY FROM | 12 | 12 |
> 0 | 1
> INSERT INTO ... SELECT | 120 | 24 |
> 80 | 5
> LOGICAL REPLICATION APPLY | 61 | 12 |
> 80.32 | 5
>
looks pretty impressive!
Pavel
>
> Test setup:
> ./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
> install.log && make -j 8 install > install.log 2>&1 &
>
> wal_level=logical
> max_wal_size = 256GB
> checkpoint_timeout = 1h
>
> Test system is EC2 instance of type c5.4xlarge:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 46 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Vendor ID: GenuineIntel
> Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
> CPU family: 6
> Model: 85
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 1
> Stepping: 7
> BogoMIPS: 5999.99
> Caches (sum of all):
> L1d: 256 KiB (8 instances)
> L1i: 256 KiB (8 instances)
> L2: 8 MiB (8 instances)
> L3: 35.8 MiB (1 instance)
> NUMA:
> NUMA node(s): 1
> NUMA node0 CPU(s): 0-15
> RAM:
> MemTotal: 32036536 kB
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2024-04-24 16:08:57 | Use WALReadFromBuffers in more places |
Previous Message | Tom Lane | 2024-04-24 15:21:13 | Re: Tarball builds in the new world order |