From: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
---|---|
To: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
Date: | 2025-03-09 11:27:58 |
Message-ID: | CAJDiXggcx+v7eKruvvBK-mpyf3Y3e8vgBJhcZwhkm4p6907edw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
A few days ago I came up with an idea to implement multi insert
optimization wherever possible. I prepared a raw patch
and it showed a great performance gain (up to 4 times during INSERT
... INTO ... in the best case).
Then I was very happy to find this thread. You did a great job and I
want to help you to bring the matter to an end.
On Thu, Oct 31, 2024 at 11:17 AM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> wrote:
> I did some performance test these days, and I have some findings.
> HEAD:
> 12.29% postgres [.] pg_checksum_block
> 6.33% postgres [.] GetPrivateRefCountEntry
> 5.40% postgres [.] pg_comp_crc32c_sse42
> 4.54% [kernel] [k] copy_user_enhanced_fast_string
> 2.69% postgres [.] BufferIsValid
> 1.52% postgres [.] XLogRecordAssemble
>
> Patched:
> 11.75% postgres [.] tts_virtual_materialize
> 8.87% postgres [.] pg_checksum_block
> 8.17% postgres [.] slot_deform_heap_tuple
> 8.09% postgres [.] heap_compute_data_size
> 6.17% postgres [.] fill_val
> 3.81% postgres [.] heap_fill_tuple
> 3.37% postgres [.] tts_virtual_copyslot
> 2.62% [kernel] [k] copy_user_enhanced_fast_string
I applied v25 patches on the master branch and made some measurements
to find out what is the bottleneck in this case. The 'time' utility
showed that without a patch, this query will run 1.5 times slower. I
also made a few flamegraphs for this test. Most of the time is spent
calling
these two functions : tts_virtual_copyslot and heap_form_tuple.
All tests were run in virtual machine with these CPU characteristics:
Architecture: x86_64
CPU(s): 2
On-line CPU(s) list: 0,1
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 128 KiB (2 instances)
L1i: 128 KiB (2 instances)
L2: 1 MiB (2 instances)
L3: 32 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0,1
In my implementation, I used Tuplestore functionality to store tuples.
In order to get rid of getting stuck in the above mentioned functions,
I crossed it with the current implementation (v25 patches) and got a
10% increase in performance (for the test above). I also set up v22
patches to
compare performance (with/without tuplestore) for INSERT ... INTO ...
queries (with -j 4 -c 10 parameters for pgbech), and there also was an
increase in TPS (about 3-4%).
I attach a patch that adds Tuplestore to v25. What do you think about this idea?
--
Best regards,
Daniil Davydov
Attachment | Content-Type | Size |
---|---|---|
0001-Replace-holding-tuples-in-virtual-slots-with-tuplest.patch | text/x-patch | 5.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2025-03-09 11:38:27 | Re: Enhance file_fdw to report processed and skipped tuples in COPY progress |
Previous Message | Jacob Brazeal | 2025-03-09 08:09:24 | Incorrect assert in libpqwalreceiver |