Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: zhjwpku(at)gmail(dot)com
Cc: nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-07 05:04:58
Message-ID: 20231207.140458.425537343057608813.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAEG8a3LSRhK601Bn50u71BgfNWm4q3kv-o-KEq=hrbyLbY_EsA(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 22:07:51 +0800,
Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> Should we extract both *copy to* and *copy from* for the first step, in that
> case we can add the pg_copy_handler catalog smoothly later.

I don't object it (mixing TO/FROM changes to one patch) but
it may make review difficult. Is it acceptable?

FYI: I planed that I implement TO part, and then FROM part,
and then unify TO/FROM parts if needed. [1]

> Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> please take a look.

Thanks. Here are my comments:

> + /*
> + * Error is relevant to a particular line.
> + *
> + * If line_buf still contains the correct line, print it.
> + */
> + if (cstate->line_buf_valid)

We need to fix the indentation.

> +CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> + FmgrInfo *in_functions;
> + Oid *typioparams;
> + Oid in_func_oid;
> + AttrNumber num_phys_attrs;
> +
> + /*
> + * Pick up the required catalog information for each attribute in the
> + * relation, including the input function, the element type (to pass to
> + * the input function), and info about defaults and constraints. (Which
> + * input function we use depends on text/binary format choice.)
> + */
> + num_phys_attrs = tupDesc->natts;
> + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));

We need to update the comment because defaults and
constraints aren't picked up here.

> +CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc)
...
> + /*
> + * Pick up the required catalog information for each attribute in the
> + * relation, including the input function, the element type (to pass to
> + * the input function), and info about defaults and constraints. (Which
> + * input function we use depends on text/binary format choice.)
> + */
> + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));

ditto.

> @@ -1716,15 +1776,6 @@ BeginCopyFrom(ParseState *pstate,
> ReceiveCopyBinaryHeader(cstate);
> }

I think that this block should be moved to
CopyFromFormatBinaryStart() too. But we need to run it after
we setup inputs such as data_source_cb, pipe and filename...

+/* Routines for a COPY HANDLER implementation. */
+typedef struct CopyHandlerOps
+{
+ /* Called when COPY TO is started. This will send a header. */
+ void (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc);
+
+ /* Copy one row for COPY TO. */
+ void (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot);
+
+ /* Called when COPY TO is ended. This will send a trailer. */
+ void (*copy_to_end) (CopyToState cstate);
+
+ void (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc);
+ bool (*copy_from_next) (CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+ void (*copy_from_error_callback) (CopyFromState cstate);
+ void (*copy_from_end) (CopyFromState cstate);
+} CopyHandlerOps;

It seems that "copy_" prefix is redundant. Should we use
"to_start" instead of "copy_to_start" and so on?

BTW, it seems that "COPY FROM (FORMAT json)" may not be implemented. [2]
We may need to care about NULL copy_from_* cases.

> I added a hook *copy_from_end* but this might be removed later if not used.

It may be useful to clean up resources for COPY FROM but the
patch doesn't call the copy_from_end. How about removing it
for now? We can add it and call it from EndCopyFrom() later?
Because it's not needed for now.

I think that we should focus on refactoring instead of
adding a new feature in this patch.

[1]: https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
[2]: https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-12-07 05:11:01 Re: logical decoding and replication of sequences, take 2
Previous Message Amit Kapila 2023-12-07 04:56:55 Re: logical decoding and replication of sequences, take 2