Quick Links

Re: Make COPY format extendable: Extract COPY TO format implementations

From:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	Sutou Kouhei <kou(at)clear-code(dot)com>
Cc:	sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Make COPY format extendable: Extract COPY TO format implementations
Date:	2025-03-31 17:05:34
Message-ID:	CAKFQuwbhSssKTJyeYo9rn20zffV3L7wdQSbEQ8zwRfC=uXLkVA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:

>
> > ---
> > +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> > + .type = T_CopyFromRoutine,
> > + .CopyFromInFunc = CopyFromInFunc,
> > + .CopyFromStart = CopyFromStart,
> > + .CopyFromOneRow = CopyFromOneRow,
> > + .CopyFromEnd = CopyFromEnd,
> > +};
> >
>
>
In trying to document the current API I'm strongly disliking it. Namely,
the amount of internal code an extension needs to care about/copy-paste to
create a working handler.

Presently, pg_type defines text and binary I/O routines and the text/csv
formats use the text I/O while binary uses binary I/O - for all
attributes. The CopyFromInFunc API allows for each attribute to somehow
have its I/O format individualized. But I don't see how that is practical
or useful, and it adds burden on API users.

I suggest we remove both .CopyFromInFunc and .CopyFromStart/End and add a
property to CopyFromRoutine (.ioMode?) with values of either Copy_IO_Text
or Copy_IO_Binary and then just branch to either:

CopyFromTextLikeInFunc & CopyFromTextLikeStart/End
or
CopyFromBinaryInFunc & CopyFromStart/End

So, in effect, the only method an extension needs to write is converting
to/from the 'serialized' form to the text/binary form (text being near
unanimous).

In a similar manner, the amount of boilerplate within CopyFromOneRow seems
undesirable from an API perspective.

cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;

if (string != NULL)
nulls[m] = false;

if (cstate->defaults[m])
{
/* We must have switched into the per-tuple memory context */
Assert(econtext != NULL);
Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);

values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}

/*
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
typioparams[m],
att->atttypmod,
(Node *) cstate->escontext,
&values[m]))
{
CopyFromSkipErrorRow(cstate);
return true;
}

cstate->cur_attname = NULL;
cstate->cur_attval = NULL;

It seems to me that CopyFromOneRow could simply produce a *string
collection,
one cell per attribute, and NextCopyFrom could do all of the above on a
for-loop over *string

The pg_type and pg_proc catalogs are not extensible so the API can and
should be limited to
producing the, usually text, values that are ready to be passed into the
text I/O routines and all
the work to find and use types and functions left in the template code.

I haven't looked at COPY TO but I am expecting much the same. The API
should simply receive
the content of the type I/O output routine (binary or text as it dictates)
for each output attribute, by
row, and be expected to take those values and produce a final output from
them.

David J.

In response to

Re: Make COPY format extendable: Extract COPY TO format implementations at 2025-03-27 03:28:40 from Sutou Kouhei

Responses

Re: Make COPY format extendable: Extract COPY TO format implementations at 2025-04-04 06:42:48 from Sutou Kouhei

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-03-31 17:11:58	Re: general purpose array_sort
Previous Message	Robert Haas	2025-03-31 16:55:45	Re: Reduce "Var IS [NOT] NULL" quals during constant folding