Re: Make COPY format extendable: Extract COPY TO format implementations

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-07 08:46:53
Message-ID: CAEG8a3+VpxYvmN3zjDuTOZx7nkn3kxOcO5JB=APQZ4XgkXTs+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 7, 2023 at 1:05 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAEG8a3LSRhK601Bn50u71BgfNWm4q3kv-o-KEq=hrbyLbY_EsA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 22:07:51 +0800,
> Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> > Should we extract both *copy to* and *copy from* for the first step, in that
> > case we can add the pg_copy_handler catalog smoothly later.
>
> I don't object it (mixing TO/FROM changes to one patch) but
> it may make review difficult. Is it acceptable?
>
> FYI: I planed that I implement TO part, and then FROM part,
> and then unify TO/FROM parts if needed. [1]

I'm fine with step by step refactoring, let's just wait for more
suggestions.

>
> > Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> > please take a look.
>
> Thanks. Here are my comments:
>
> > + /*
> > + * Error is relevant to a particular line.
> > + *
> > + * If line_buf still contains the correct line, print it.
> > + */
> > + if (cstate->line_buf_valid)
>
> We need to fix the indentation.
>
> > +CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
> > +{
> > + FmgrInfo *in_functions;
> > + Oid *typioparams;
> > + Oid in_func_oid;
> > + AttrNumber num_phys_attrs;
> > +
> > + /*
> > + * Pick up the required catalog information for each attribute in the
> > + * relation, including the input function, the element type (to pass to
> > + * the input function), and info about defaults and constraints. (Which
> > + * input function we use depends on text/binary format choice.)
> > + */
> > + num_phys_attrs = tupDesc->natts;
> > + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> > + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
>
> We need to update the comment because defaults and
> constraints aren't picked up here.
>
> > +CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc)
> ...
> > + /*
> > + * Pick up the required catalog information for each attribute in the
> > + * relation, including the input function, the element type (to pass to
> > + * the input function), and info about defaults and constraints. (Which
> > + * input function we use depends on text/binary format choice.)
> > + */
> > + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> > + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
>
> ditto.
>
>
> > @@ -1716,15 +1776,6 @@ BeginCopyFrom(ParseState *pstate,
> > ReceiveCopyBinaryHeader(cstate);
> > }
>
> I think that this block should be moved to
> CopyFromFormatBinaryStart() too. But we need to run it after
> we setup inputs such as data_source_cb, pipe and filename...
>
> +/* Routines for a COPY HANDLER implementation. */
> +typedef struct CopyHandlerOps
> +{
> + /* Called when COPY TO is started. This will send a header. */
> + void (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc);
> +
> + /* Copy one row for COPY TO. */
> + void (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot);
> +
> + /* Called when COPY TO is ended. This will send a trailer. */
> + void (*copy_to_end) (CopyToState cstate);
> +
> + void (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc);
> + bool (*copy_from_next) (CopyFromState cstate, ExprContext *econtext,
> + Datum *values, bool *nulls);
> + void (*copy_from_error_callback) (CopyFromState cstate);
> + void (*copy_from_end) (CopyFromState cstate);
> +} CopyHandlerOps;
>
> It seems that "copy_" prefix is redundant. Should we use
> "to_start" instead of "copy_to_start" and so on?
>
> BTW, it seems that "COPY FROM (FORMAT json)" may not be implemented. [2]
> We may need to care about NULL copy_from_* cases.
>
>
> > I added a hook *copy_from_end* but this might be removed later if not used.
>
> It may be useful to clean up resources for COPY FROM but the
> patch doesn't call the copy_from_end. How about removing it
> for now? We can add it and call it from EndCopyFrom() later?
> Because it's not needed for now.
>
> I think that we should focus on refactoring instead of
> adding a new feature in this patch.
>
>
> [1]: https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2]: https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
>
>
> Thanks,
> --
> kou

--
Regards
Junwang Zhao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2023-12-07 09:00:01 Re: postgres_fdw test timeouts
Previous Message Michael Paquier 2023-12-07 08:44:53 Re: Test 002_pg_upgrade fails with olddump on Windows