Re: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-11-10 17:16:38
Message-ID: 43d12f2c-7c7d-f90b-8cde-b8104e8f1cb4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/10/20 4:05 PM, Tomas Vondra wrote:
> Hi,
>
> Thanks for working on this!
>
> On 11/10/20 1:45 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
>> Hello,
>>
>>
>> The attached patch implements the new bulk insert routine for
>> postgres_fdw and the executor utilizing it. It passes make
>> check-world.
>>
>
> I haven't done any testing yet, just a quick review.
>
> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
> that's something we should do earlier, when we're building the simple
> query (for 1-row inserts). I'd understand if you were concerned about
> overhead in case of 1-row inserts, trying to not plan the bulk query
> until necessary, but I'm not sure this actually helps.
>
> Or was the goal to build a query for every possible number of slots? I
> don't think that's really useful, considering it requires deallocating
> the old plan, preparing a new one, etc. IMO it should be sufficient to
> have two queries - one for 1-row inserts, one for the full batch. The
> last incomplete batch can be inserted using a loop of 1-row queries.
>
> That's what my patch was doing, but I'm not insisting on that - it just
> seems like a better approach to me. So feel free to argue why this is
> better.
>
>
>> I measured performance in a basic non-partitioned case by modifying
>> Tomas-san's scripts. They perform an INSERT SELECT statement that
>> copies one million records. The table consists of two integer
>> columns, with a primary key on one of those them. You can run the
>> attached prepare.sql to set up once. local.sql inserts to the table
>> directly, while fdw.sql inserts through a foreign table.
>>
>> The performance results, the average time of 5 runs, were as follows
>> on a Linux host where the average round-trip time of "ping localhost"
>> was 34 us:
>>
>> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
>> 11.1 seconds (11x improvement)
>>
>
> Nice. I think we can't really get much closer to local master, so 6.1
> vs. 11.1 seconds look quite acceptable.
>
>>
>> The patch accumulates at most 100 records in ModifyTableState before
>> inserting in bulk. Also, when an input record is targeted for a
>> different relation (= partition) than that for already accumulated
>> records, insert the accumulated records and store the new record for
>> later insert.
>>
>> [Issues]
>>
>> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
>> (integer), to control the number of records inserted at once? The
>> range of allowed values would be between 1 and 1,000. 1 disables
>> bulk insert. The possible reason of the need for this kind of
>> parameter would be to limit the amount of memory used for accumulated
>> records, which could be prohibitively large if each record is big. I
>> don't think this is a must, but I think we can have it.
>>
>
> I think it'd be good to have such GUC, even if only for testing and
> development. We should probably have a way to disable the batching,
> which the GUC could also do, I think. So +1 to have the GUC.
>
>> 2. Should we accumulate records per relation in ResultRelInfo
>> instead? That is, when inserting into a partitioned table that has
>> foreign partitions, delay insertion until a certain number of input
>> records accumulate, and then insert accumulated records per relation
>> (e.g., 50 records to relation A, 30 records to relation B, and 20
>> records to relation C.) If we do that,
>>
>
> I think there's a chunk of text missing here? If we do that, then what?
>
> Anyway, I don't see why accumulating the records in ResultRelInfo would
> be better than what the patch does now. It seems to me like fairly
> specific to FDWs, so keeping it int FDW state seems appropriate. What
> would be the advantage of stashing it in ResultRelInfo?
>
>> * The order of insertion differs from the order of input records. Is
>> it OK?
>>
>
> I think that's OK for most use cases, and if it's not (e.g. when there's
> something requiring the exact order of writes) then it's not possible to
> use batching. That's one of the reasons why I think we should have a GUC
> to disable the batching.
>
>> * Should the maximum count of accumulated records be applied per
>> relation or the query? When many foreign partitions belong to a
>> partitioned table, if the former is chosen, it may use much memory in
>> total. If the latter is chosen, the records per relation could be
>> few and thus the benefit of bulk insert could be small.
>>
>
> I think it needs to be applied per relation, because that's the level at
> which we can do it easily and consistently. The whole point is to send
> data in sufficiently large chunks to minimize the communication overhead
> (latency etc.), but if you enforce it "per query" that seems hard.
>
> Imagine you're inserting data into a table with many partitions - how do
> you pick the number of rows to accumulate? The table may have 10 or 1000
> partitions, we may be inserting into all partitions or just a small
> subset, not all partitions may be foreign, etc. It seems pretty
> difficult to pick and enforce a reliable limit at the query level. But
> maybe I'm missing something and it's easier than I think?
>
> Of course, you're entirely correct enforcing this at the partition level
> may require a lot of memory. Sadly, I don't see a way around that,
> except for (a) disabling batching or (b) ordering the data to insert
> data into one partition at a time.
>

Two more comments regarding this:

1) If we want to be more strict about the memory consumption, we should
probably set the limit in terms of memory, not number of rows. Currently
the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
this is not the only place with this issue.

2) I wonder what the COPY FROM patch [1] does in this regard. I don't
have time to check right now, but I suggest we try to do the same thing,
if only to be consistent.

[1]
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-11-10 17:42:05 Re: Windows regress fails (latest HEAD)
Previous Message Russell Foster 2020-11-10 17:16:18 Re: Windows regress fails (latest HEAD)