Quick Links

RE: POC: postgres_fdw insert batching

From:	"tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To:	'Tomas Vondra' <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: POC: postgres_fdw insert batching
Date:	2020-11-25 05:04:36
Message-ID:	TYAPR01MB2990EF5668C41AC71EC3EDDDFEFA0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> On 11/24/20 9:45 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> GetInsertBatchSize may be better. That is, this API deals with multiple
> records in a single INSERT statement. Your GetModifyBatchSize will be
> reserved for statement batching when libpq has supported batch/pipelining to
> execute multiple INSERT/UPDATE/DELETE statements, as in the following
> JDBC batch updates. What do you think?
> >
>
> I don't know. I was really only thinking about batching in the context
> of a single DML command, not about batching of multiple commands at the
> protocol level. IMHO it's far more likely we'll add support for batching
> for DELETE/UPDATE than libpq pipelining, which seems rather different
> from how the FDW API works. Which is why I was suggesting to use a name
> that would work for all DML commands, not just for inserts.

Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statement batching. So I'll take your suggested name.

> Not sure, but I'd guess knowing whether batching is used would be
> useful. We only print the single-row SQL query, which kinda gives the
> impression that there's no batching.

Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.

> > Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value
> that was already saved in a struct in create_foreign_modify().
> >
>
> Well, I do worry for two reasons.
>
> Firstly, the fact that in postgres_fdw the call is cheap does not mean
> it'll be like that in every other FDW. Presumably, the other FDWs might
> cache it in the struct and do the same thing, of course.
>
> But the fact that we're calling it over and over for each row kinda
> seems like we allow the value to change during execution, but I very
> much doubt the code is expecting that. I haven't tried, but assume the
> function first returns 10 and then 100. ISTM the code will allocate
> ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> That can't end well. Sure, we can claim it's a bug in the FDW extension,
> but it's also due to the API design.

You worried about other FDWs than postgres_fdw. That's reasonable. I insisted in other threads that PG developers care only about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake. I made changes so that the executor calls GetModifyBatchSize() once per relation per statement.

Regards
Takayuki Tsunakawa

Attachment	Content-Type	Size
v5-0001-Add-bulk-insert-for-foreign-tables.patch	application/octet-stream	49.0 KB

In response to

Re: POC: postgres_fdw insert batching at 2020-11-24 17:08:57 from Tomas Vondra

Responses

Re: POC: postgres_fdw insert batching at 2020-11-30 08:29:02 from David Fetter
Re: POC: postgres_fdw insert batching at 2021-01-12 02:06:29 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andy Fan	2020-11-25 05:12:59	Re: About adding a new filed to a struct in primnodes.h
Previous Message	Thomas Munro	2020-11-25 05:00:06	Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)