| From: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
|---|---|
| To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [POC] Fast COPY FROM command for the table with foreign partitions |
| Date: | 2020-07-29 08:36:01 |
| Message-ID: | 9a65d03b-250f-b086-5a3c-541d8e2de30d@postgrespro.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 7/29/20 1:03 PM, Amit Langote wrote:
> Hi Andrey,
>
> Thanks for updating the patch. I will try to take a look later.
>
> On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 7/16/20 2:14 PM, Amit Langote wrote:
>>> * Why the "In" in these API names?
>>>
>>> + /* COPY a bulk of tuples into a foreign relation */
>>> + BeginForeignCopyIn_function BeginForeignCopyIn;
>>> + EndForeignCopyIn_function EndForeignCopyIn;
>>> + ExecForeignCopyIn_function ExecForeignCopyIn;
>>
>> I used an analogy from copy.c.
>
> Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
> makes sense to have "In" here, but maybe we don't, so how about
> leaving out the "In" for clarity?
Ok, sounds good.
>
>>> * I see that the remote copy is performed from scratch on every call
>>> of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
>>> send the `COPY remote_table FROM STDIN` in
>>> postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
>>> when there are no errors during the copy?
>>
>> It is not possible. FDW share one connection between all foreign
>> relations from a server. If two or more partitions will be placed at one
>> foreign server you will have problems with concurrent COPY command.
>
> Ah, you're right. I didn't consider multiple foreign partitions
> pointing to the same server. Indeed, we would need separate
> connections to a given server to COPY to multiple remote relations on
> that server in parallel.
>
>> May be we can create new connection for each partition?
>
> Yeah, perhaps, although it sounds like something that might be more
> generally useful and so we should work on that separately if at all.
I will try to prepare a separate patch.
>
>>> I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
>>> and sending `COPY remote_table FROM STDIN` only once instead of on
>>> every flush -- and I see significant speedup. Please check the
>>> attached patch that applies on top of yours.
>>
>> I integrated first change and rejected the second by the reason as above.
>
> Thanks.
>
> Will send more comments after reading the v5 patch.
>
Ok. I'll be waiting for the end of your review.
--
regards,
Andrey Lepikhov
Postgres Professional
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Lu, Chenyang | 2020-07-29 08:42:27 | [PATCH]Fix ja.po error |
| Previous Message | Michael Paquier | 2020-07-29 08:32:53 | Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions? |