From: | "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | RE: POC: postgres_fdw insert batching |
Date: | 2020-11-10 00:45:50 |
Message-ID: | TYAPR01MB2990ECD1C68EA694DD0667E4FEE90@TYAPR01MB2990.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
The attached patch implements the new bulk insert routine for postgres_fdw and the executor utilizing it. It passes make check-world.
I measured performance in a basic non-partitioned case by modifying Tomas-san's scripts. They perform an INSERT SELECT statement that copies one million records. The table consists of two integer columns, with a primary key on one of those them. You can run the attached prepare.sql to set up once. local.sql inserts to the table directly, while fdw.sql inserts through a foreign table.
The performance results, the average time of 5 runs, were as follows on a Linux host where the average round-trip time of "ping localhost" was 34 us:
master, local: 6.1 seconds
master, fdw: 125.3 seconds
patched, fdw: 11.1 seconds (11x improvement)
The patch accumulates at most 100 records in ModifyTableState before inserting in bulk. Also, when an input record is targeted for a different relation (= partition) than that for already accumulated records, insert the accumulated records and store the new record for later insert.
[Issues]
1. Do we want a GUC parameter, say, max_bulk_insert_records = (integer), to control the number of records inserted at once?
The range of allowed values would be between 1 and 1,000. 1 disables bulk insert.
The possible reason of the need for this kind of parameter would be to limit the amount of memory used for accumulated records, which could be prohibitively large if each record is big. I don't think this is a must, but I think we can have it.
2. Should we accumulate records per relation in ResultRelInfo instead?
That is, when inserting into a partitioned table that has foreign partitions, delay insertion until a certain number of input records accumulate, and then insert accumulated records per relation (e.g., 50 records to relation A, 30 records to relation B, and 20 records to relation C.) If we do that,
* The order of insertion differs from the order of input records. Is it OK?
* Should the maximum count of accumulated records be applied per relation or the query?
When many foreign partitions belong to a partitioned table, if the former is chosen, it may use much memory in total. If the latter is chosen, the records per relation could be few and thus the benefit of bulk insert could be small.
Regards
Takayuki Tsunakawa
Attachment | Content-Type | Size |
---|---|---|
fdw.sql | application/octet-stream | 52 bytes |
local.sql | application/octet-stream | 51 bytes |
prepare.sql | application/octet-stream | 544 bytes |
v1-0001-Add-bulk-insert-for-foreign-tables.patch | application/octet-stream | 30.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-11-10 00:59:41 | Re: Disable WAL logging to speed up data loading |
Previous Message | Andy Fan | 2020-11-10 00:43:59 | Make Append Cost aware of some run time partition prune case |