Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-11-27 07:53:44
Message-ID: 20241127.165344.1671821093365528062.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoBW5dEv=Gd2iF_BYNZGEsF=3KTG7fpq=vP5qwpC1CAOeA(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 23:10:50 -0800,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> Custom COPY format extensions need to use
>> CopyToStateData/CopyFromStateData. For example,
>> CopyToStateData::rel is used to retrieve table schema. If we
>> move CopyToStateData to copyto_internal.h not copyapi.h,
>> custom COPY format extensions need to include
>> copyto_internal.h. I feel that it's strange that extensions
>> need to use internal headers.
>>
>> What is your real concern? If you don't want to export
>> CopyToStateData/CopyFromStateData entirely, we can provide
>> accessors only for some members of them.
>
> I'm not against exposing CopyToStateData and CopyFromStateData. My
> concern is that if we move all copy-related structs to copyapi.h,
> other copy-related files would need to include copyapi.h even if the
> file is not related to copy format APIs. IMO copyapi.h should have
> only copy-format-API-related variables structs such as CopyFromRoutine
> and CopyToRoutine and functions that custom COPY format extension can
> utilize to access data source and destination, such as CopyGetData().
>
> When it comes to CopyToStateData and CopyFromStateData, I feel that
> they have mixed fields of common fields (e.g., rel, num_errors, and
> transition_capture) and format-specific fields (e.g., input_buf,
> line_buf, and eol_type). While it makes sense to me that custom copy
> format extensions can access the common fields, it seems odd to me
> that they can access text-and-csv-format-specific fields such as
> input_buf. We might want to sort out these fields but it could be a
> huge task.

I understand you concern.

How about using Copy{To,From}StateData::opaque to store
text-and-csv-format-specific data? I feel that this
refactoring doesn't block the 0001/0002 patches. Do you
think that this is a blocker of the 0001/0002 patches?

I think that this may block the 0004/0007 patches that
export Copy{To,From}StateData. But we can work on it after
we merge the 0004/0007 patches. Which is preferred?

> Also, I realized that CopyFromTextLikeOneRow() does input function
> calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY
> options. I think these should be done in the core side.

How about extracting the following part in NextCopyFrom() as
a function and provide it for extensions?

----
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);

cstate->num_errors++;

if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
{
/*
* Since we emit line number and column info in the below
* notice message, we suppress error context information
* other than the relation name.
*/
Assert(!cstate->relname_only);
cstate->relname_only = true;

if (cstate->cur_attval)
{
char *attval;

attval = CopyLimitPrintoutLength(cstate->cur_attval);
ereport(NOTICE,
errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
(unsigned long long) cstate->cur_lineno,
cstate->cur_attname,
attval));
pfree(attval);
}
else
ereport(NOTICE,
errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input",
(unsigned long long) cstate->cur_lineno,
cstate->cur_attname));

/* reset relname_only */
cstate->relname_only = false;
}
----

See the attached v27 patch set for this idea.

0001-0008 are almost same as the v26 patch set.
("format" -> "FORMAT" in COPY test changes are included.)

0009 exports the above code as
CopyFromSkipErrorRow(). Extensions should call it when they
use errsave() for a soft error in CopyFromOneRow callback.

Thanks,
--
kou

Attachment Content-Type Size
v27-0001-Refactor-COPY-TO-to-use-format-callback-function.patch text/x-patch 17.9 KB
v27-0002-Refactor-COPY-FROM-to-use-format-callback-functi.patch text/x-patch 32.5 KB
v27-0003-Add-support-for-adding-custom-COPY-TO-format.patch text/x-patch 17.6 KB
v27-0004-Export-CopyToStateData.patch text/x-patch 9.1 KB
v27-0005-Add-support-for-implementing-custom-COPY-TO-form.patch text/x-patch 2.0 KB
v27-0006-Add-support-for-adding-custom-COPY-FROM-format.patch text/x-patch 9.1 KB
v27-0007-Export-CopyFromStateData.patch text/x-patch 17.4 KB
v27-0008-Add-support-for-implementing-custom-COPY-FROM-fo.patch text/x-patch 1.9 KB
v27-0009-Add-CopyFromSkipErrorRow-for-custom-COPY-format-.patch text/x-patch 10.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-11-27 08:01:26 Re: Support LIKE with nondeterministic collations
Previous Message wenhui qiu 2024-11-27 07:51:05 Re: Auto Vacuum optimisation