From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
---|---|
To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
Cc: | sawada(dot)mshk(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2024-11-27 11:49:17 |
Message-ID: | CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, Nov 25, 2024 at 2:02 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <20241125(dot)110620(dot)313152541320718947(dot)kou(at)clear-code(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 11:06:20 +0900 (JST),
> Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> >> I've attached the v25 patches that squashed the minor changes I made
> >> in v24 and incorporated all comments I got so far. I think these two
> >> patches are in good shape. Could you rebase remaining patches on top
> >> of them so that we can see the big picture of this feature?
> >
> > OK. I'll work on it.
>
> I've attached the v26 patch set:
>
> 0001: It's same as 0001 in the v25 patch set.
> 0002: It's same as 0002 in the v25 patch set.
> 0003: It's same as 0003 in the v23 patch set.
> This parses the "format" option and adds support for
> custom TO handler.
> 0004: It's same as 0004 in the v23 patch set.
> This exports CopyToStateData. But the following
> enums/structs/functions aren't moved to copyapi.h from
> copy.h:
> * CopyHeaderChoice
> * CopyOnErrorChoice
> * CopyLogVerbosityChoice
> * CopyFormatOptions
> * copy_data_dest_cb()
> 0005: It's same as 0005 in the v23 patch set.
> This adds missing APIs to implement custom TO handler
> as an extension.
> 0006: It's same as 0008 in the v23 patch set.
> This adds support for custom FROM handler.
> 0007: It's same as 0009 in the v23 patch set.
> This exports CopyFromStateData.
> 0008: It's same as 0010 in the v23 patch set.
> This adds missing APIs to implement custom FROM handler
> as an extension.
>
> I've also updated https://github.com/kou/pg-copy-arrow for
> the current API.
>
> I think that we can merge only 0001/0002 as the first
> step. Because they don't change the current behavior and
> they improve performance. We can merge other patches after
> that.
>
> >> Regarding exposing the structs such as CopyToStateData, v22-0004 patch
> >> moves most of all copy-related structs to copyapi.h from copyto.c,
> >> copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
> >> expose CopyToStateData (and related structs) in a new file
> >> copyto_internal.h and keep other structs in the original header files.
> >
> > Custom COPY format extensions need to use
> > CopyToStateData/CopyFromStateData. For example,
> > CopyToStateData::rel is used to retrieve table schema. If we
> > move CopyToStateData to copyto_internal.h not copyapi.h,
> > custom COPY format extensions need to include
> > copyto_internal.h. I feel that it's strange that extensions
> > need to use internal headers.
> >
> > What is your real concern? If you don't want to export
> > CopyToStateData/CopyFromStateData entirely, we can provide
> > accessors only for some members of them.
>
> The v26 patch set still exports
> CopyToStateData/CopyFromStateData in copyapi.h not
> copy{to,from}_internal.h. But I didn't move the following
> enums/structs/functions:
>
> * CopyHeaderChoice
> * CopyOnErrorChoice
> * CopyLogVerbosityChoice
> * CopyFormatOptions
> * copy_data_dest_cb()
>
> What do you think about this approach?
>
>
> Thanks,
> --
> kou
I just gave this another round of benchmarking tests. I'd like to
share the number,
since COPY TO has some performance drawbacks, I test only COPY TO. I
use the run.sh Tomas provided earlier but use pgbench with a custom script, you
can find it here[0].
I tested 3 branches:
1. the master branch
2. all v26 patch sets applied
3. Emitting JSON to file using COPY TO v13 patch set[1], this add some
if branch in CopyOneRowTo, so I was expecting this slower than master
2 can be about -3%~+3% compared to 1, but what surprised me is that 3
is always better than 1 & 2.
I reviewed the patch set of 3 and I don't see any magic.
You can see the detailed results here[2], I can not upload files so I
just shared the google doc link, ping me if you can not open the link.
[0]: https://github.com/pghacking/scripts/tree/main/extensible_copy
[1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com
[2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing
--
Regards
Junwang Zhao
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2024-11-27 12:04:58 | Re: Add Pipelining support in psql |
Previous Message | Ashutosh Bapat | 2024-11-27 11:38:14 | Re: Difference in dump from original and restored database due to NOT NULL constraints on children |