Re: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-11-18 18:57:30
Message-ID: d9a17f23-81cc-fca1-56e6-602bd2ae06b4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/17/20 10:11 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> Hello,
>
>
> Modified the patch as I talked with Tomas-san. The performance
> results of loading one million records into a hash-partitioned table
> with 8 partitions are as follows:
>
> unpatched, local: 8.6 seconds unpatched, fdw: 113.7 seconds patched,
> fdw: 12.5 seconds (9x improvement)
>
> The test scripts are also attached. Run prepare.sql once to set up
> tables and source data. Run local_part.sql and fdw_part.sql to load
> source data into a partitioned table with local partitions and a
> partitioned table with foreign tables respectively.
>

Unfortunately, this does not compile for me, because nodeModifyTable
calls ExecGetTouchedPartitions, which is not defined anywhere. Not sure
what's that about, so I simply commented-out this. That probably fails
the partitioned cases, but it allowed me to do some review and testing.

As for the patch, I have a couple of comments

1) As I mentioned before, I really don't think we should be doing
deparsing in execute_foreign_modify - that's something that should
happen earlier, and should be in a deparse.c function.

2) I think the GUC should be replaced with an server/table option,
similar to fetch_size.

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a
query for the "full" batch, and then uses those two queries - when we
get a full batch we use the bulk query, otherwise we use the single-row
query in a loop. IMO this is cleaner than deparsing queries ad hoc in
the execute_foreign_modify.

Of course, this might be worse when we don't have a full batch, e.g. for
a query that insert only 50 rows with batch_size=100. If this case is
common, one option would be lowering the batch_size accordingly. If we
really want to improve this case too, I suggest we pass more info than
just a position of the VALUES clause - that seems a bit too hackish.

Secondly, it adds the batch_size option to server/foreign table, and
uses that. This is not complete, though. postgresPlanForeignModify
currently passes a hard-coded value at the moment, it needs to lookup
the correct value for the server/table from RelOptInfo or something. And
I suppose ModifyTable inftractructure will need to determine the value
in order to pass the correct number of slots to the FDW API.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to prepare
the bulk statement. It also adds list_make5/list_make6 macros, so as to
not have to do strange stuff with the parameter lists.

A finally, this should probably add a bunch of regression tests.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v3-0001-Add-bulk-insert-for-foreign-tables.patch text/x-patch 33.9 KB
v3-0002-make-it-compile.patch text/x-patch 846 bytes
v3-0003-reworks.patch text/x-patch 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-18 19:05:12 Re: cutting down the TODO list thread
Previous Message Tom Lane 2020-11-18 18:53:17 Re: More time spending with "delete pending"