Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: andrew(at)dunslane(dot)net, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-25 08:45:43
Message-ID: 20240125.174543.1042528588176841299.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <ZbHS439y-Bs6HIAR(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
>
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.

OK. I'll do it.

> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
>
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring. Why not using a fetch function instead
> that uses a string in input? Then you can call that once after
> parsing the List of options in ProcessCopyOptions().

OK. How about the following for the fetch function
signature?

extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

We may introduce an enum and use it:

typedef enum CopyBuiltinFormat
{
COPY_BUILTIN_FORMAT_TEXT = 0,
COPY_BUILTIN_FORMAT_CSV,
COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;

extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.

Oh, sorry. I assumed that the comment style was adjusted by
pgindent.

I'll use the following style:

/*
* ...
*/

> + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
>
> Actually, this split is interesting. It is possible for a custom
> format to plug in a custom set of out functions. Did you make use of
> something custom for your own stuff?

I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.

I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):

1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
custom type

> Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?

Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".

> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work. The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.

CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.

> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest. Then we could move on with 0002~0004 and
> 0006~0008.

OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-01-25 08:52:55 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Bertrand Drouvot 2024-01-25 08:36:17 Re: Split index and table statistics into different types of stats