Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Daniil Davydov <3danissimo(at)gmail(dot)com>
To: Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2025-03-09 11:27:58
Message-ID: CAJDiXggcx+v7eKruvvBK-mpyf3Y3e8vgBJhcZwhkm4p6907edw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
A few days ago I came up with an idea to implement multi insert
optimization wherever possible. I prepared a raw patch
and it showed a great performance gain (up to 4 times during INSERT
... INTO ... in the best case).
Then I was very happy to find this thread. You did a great job and I
want to help you to bring the matter to an end.

On Thu, Oct 31, 2024 at 11:17 AM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> wrote:
> I did some performance test these days, and I have some findings.
> HEAD:
> 12.29% postgres [.] pg_checksum_block
> 6.33% postgres [.] GetPrivateRefCountEntry
> 5.40% postgres [.] pg_comp_crc32c_sse42
> 4.54% [kernel] [k] copy_user_enhanced_fast_string
> 2.69% postgres [.] BufferIsValid
> 1.52% postgres [.] XLogRecordAssemble
>
> Patched:
> 11.75% postgres [.] tts_virtual_materialize
> 8.87% postgres [.] pg_checksum_block
> 8.17% postgres [.] slot_deform_heap_tuple
> 8.09% postgres [.] heap_compute_data_size
> 6.17% postgres [.] fill_val
> 3.81% postgres [.] heap_fill_tuple
> 3.37% postgres [.] tts_virtual_copyslot
> 2.62% [kernel] [k] copy_user_enhanced_fast_string

I applied v25 patches on the master branch and made some measurements
to find out what is the bottleneck in this case. The 'time' utility
showed that without a patch, this query will run 1.5 times slower. I
also made a few flamegraphs for this test. Most of the time is spent
calling
these two functions : tts_virtual_copyslot and heap_form_tuple.
All tests were run in virtual machine with these CPU characteristics:
Architecture: x86_64
CPU(s): 2
On-line CPU(s) list: 0,1
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 128 KiB (2 instances)
L1i: 128 KiB (2 instances)
L2: 1 MiB (2 instances)
L3: 32 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0,1

In my implementation, I used Tuplestore functionality to store tuples.
In order to get rid of getting stuck in the above mentioned functions,
I crossed it with the current implementation (v25 patches) and got a
10% increase in performance (for the test above). I also set up v22
patches to
compare performance (with/without tuplestore) for INSERT ... INTO ...
queries (with -j 4 -c 10 parameters for pgbech), and there also was an
increase in TPS (about 3-4%).

I attach a patch that adds Tuplestore to v25. What do you think about this idea?

--
Best regards,
Daniil Davydov

Attachment Content-Type Size
0001-Replace-holding-tuples-in-virtual-slots-with-tuplest.patch text/x-patch 5.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-03-09 11:38:27 Re: Enhance file_fdw to report processed and skipped tuples in COPY progress
Previous Message Jacob Brazeal 2025-03-09 08:09:24 Incorrect assert in libpqwalreceiver