Re: Make COPY format extendable: Extract COPY TO format implementations

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-07-22 12:36:40
Message-ID: 9172d4eb-6de0-4c6d-beab-8210b7a2219b@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/22/24 09:45, Sutou Kouhei wrote:
> Hi Tomas,
>
> Thanks for joining this thread!
>
> In <257d5573-07da-48c3-ac07-e047e7a65e99(at)enterprisedb(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 19 Jul 2024 14:40:05 +0200,
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>> I think it'd be helpful if you could post a patch status, i.e. a message
>> re-explaininig what it aims to achieve, summary of the discussion so
>> far, and what you think are the open questions. Otherwise every reviewer
>> has to read the whole thread to learn this.
>
> It makes sense. It seems your questions covers all important
> points in this thread. So my answers of your questions
> summarize the latest information.
>

Thanks for the summary/responses. I still think it'd be better to post a
summary as a separate message, not as yet another post responding to
someone else. If I was reading the thread, I would not have noticed this
is meant to be a summary. I'd even consider putting a "THREAD SUMMARY"
title on the first line, or something like that. Up to you, of course.

As for the patch / decisions, thanks for the responses and explanations.
But I still find it hard to review / make judgements about the approach
based on the current version of the patch :-( Yes, it's entirely
possible earlier versions did something interesting - e.g. it might have
implemented the existing formats to the new approach. Or it might have a
private pointer in v6. But how do I know why it was removed? Was it
because it's unnecessary for the initial version? Or was it because it
turned out to not work?

And when reviewing a patch, I really don't want to scavenge through old
patch versions, looking for random parts. Not only because I don't know
what to look for, but also because it'll be harder and harder to make
those old versions work, as the patch moves evolves.

My suggestions would be to maintain this as a series of patches, making
incremental changes, with the "more complex" or "more experimental"
parts larger in the series. For example, I can imagine doing this:

0001 - minimal version of the patch (e.g. current v17)
0002 - switch existing formats to the new interface
0003 - extend the interface to add bits needed for columnar formats
0004 - add DML to create/alter/drop custom implementations
0005 - minimal patch with extension adding support for Arrow

Or something like that. The idea is that we still have a coherent story
of what we're trying to do, and can discuss the incremental changes
(easier than looking at a large patch). It's even possible to commit
earlier parts before the later parts are quite cleanup up for commit.
And some changes changes may not be even meant for commit (e.g. the
extension) but as guidance / validation for the earlier parts.

I do realize this might look like I'm requiring you to do more work.
Sorry about that. I'm just thinking about how to move the patch forward
and convince myself the approach is OK. Also, it's what I think works
quite well for other patches discussed on this mailing list (I do this
for various patches I submitted, for example). And I'm not even sure it
actually is more work.

As for the performance / profiling issues, I've read the reports and I'm
not sure I see something tremendously wrong. Yes, there are differences,
but 5% change can easily be noise, shift in binary layout, etc.

Unfortunately, there's not much information about what exactly the tests
did, context (hardware, ...). So I don't know, really. But if you share
enough information on how to reproduce this, I'm willing to take a look
and investigate.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-07-22 12:37:03 Re: Add new COPY option REJECT_LIMIT
Previous Message Andreas Karlsson 2024-07-22 12:33:16 Re: Special-case executor expression steps for common combinations