From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | andrew(at)dunslane(dot)net, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2024-01-25 08:45:43 |
Message-ID: | 20240125.174543.1042528588176841299.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <ZbHS439y-Bs6HIAR(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
>
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.
OK. I'll do it.
> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
>
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring. Why not using a fetch function instead
> that uses a string in input? Then you can call that once after
> parsing the List of options in ProcessCopyOptions().
OK. How about the following for the fetch function
signature?
extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);
We may introduce an enum and use it:
typedef enum CopyBuiltinFormat
{
COPY_BUILTIN_FORMAT_TEXT = 0,
COPY_BUILTIN_FORMAT_CSV,
COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;
extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);
> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.
Oh, sorry. I assumed that the comment style was adjusted by
pgindent.
I'll use the following style:
/*
* ...
*/
> + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
>
> Actually, this split is interesting. It is possible for a custom
> format to plug in a custom set of out functions. Did you make use of
> something custom for your own stuff?
I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.
I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):
1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
custom type
> Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?
Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".
> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work. The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.
CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.
> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest. Then we could move on with 0002~0004 and
> 0006~0008.
OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.
Thanks,
--
kou
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2024-01-25 08:52:55 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Bertrand Drouvot | 2024-01-25 08:36:17 | Re: Split index and table statistics into different types of stats |