Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: zhjwpku(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, michael(at)paquier(dot)xyz, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-10 06:20:23
Message-ID: 20240110.152023.1920937326588672387.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

>> 1. Add an opaque space for custom COPY TO handler
>> * Add CopyToState{Get,Set}Opaque()
>> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>> * Add CopyToStateGetAttNumList()
>> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>>
>> 3. Export CopySend*()
>> * Rename CopySend*() to CopyToStateSend*() and export them
>> * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>> it just flushes the internal buffer now.
>> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>>
> I guess the purpose of these helpers is to avoid expose CopyToState to
> copy.h,

Yes.

> but I
> think expose CopyToState to user might make life easier, users might want to use
> the memory contexts of the structure (though I agree not all the
> fields are necessary
> for extension handers).

OK. I don't object it as I said in another e-mail:
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72

>> 2. Need an opaque space like IndexScanDesc::opaque does
>>
>> * A custom COPY TO handler needs to keep its data
>
> I once thought users might want to parse their own options, maybe this
> is a use case for this opaque space.

Good catch! I forgot to suggest a callback for custom format
options. How about the following API?

----
...
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);

...
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);

typedef struct CopyToFormatRoutine
{
...
CopyToProcessOption_function process_option_fn;
} CopyToFormatRoutine;

typedef struct CopyFromFormatRoutine
{
...
CopyFromProcessOption_function process_option_fn;
} CopyFromFormatRoutine;
----

----
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7597894bf..1aa8b62551 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -416,6 +416,7 @@ void
ProcessCopyOptions(ParseState *pstate,
CopyFormatOptions *opts_out,
bool is_from,
+ void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
List *options)
{
bool format_specified = false;
@@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
else
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname),
- parser_errposition(pstate, defel->location)));
+ {
+ bool processed;
+ if (is_from)
+ processed = opts_out->from_ops->process_option_fn(cstate, defel);
+ else
+ processed = opts_out->to_ops->process_option_fn(cstate, defel);
+ if (!processed)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
}

/*
----

> For the name, I thought private_data might be a better candidate than
> opaque, but I do not insist.

I don't have a strong opinion for this. Here are the number
of headers that use "private_data" and "opaque":

$ grep -r private_data --files-with-matches src/include | wc -l
6
$ grep -r opaque --files-with-matches src/include | wc -l
38

It seems that we use "opaque" than "private_data" in general.

but it seems that we use
"opaque" than "private_data" in our code.

> Do you use the arrow library to control the memory?

Yes.

> Is there a way that
> we can let the arrow use postgres' memory context?

Yes. Apache Arrow C++ provides a memory pool feature and we
can implement PostgreSQL's memory context based memory pool
for this. (But this is a custom COPY TO/FROM handler's
implementation details.)

> I'm not sure this
> is necessary, just raise the question for discussion.

Could you clarify what should we discuss? We should require
that COPY TO/FROM handlers should use PostgreSQL's memory
context for all internal memory allocations?

> +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> +Datum
> +copy_testfmt_handler(PG_FUNCTION_ARGS)
> +{
> + bool is_from = PG_GETARG_BOOL(0);
> + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> +
>
> extra semicolon.

I noticed it too :-)
But I ignored it because the current implementation is only
for discussion. We know that it may be dirty.

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-10 06:25:36 Re: Add BF member koel-like indentation checks to SanityCheck CI
Previous Message John Naylor 2024-01-10 06:14:25 Re: Add BF member koel-like indentation checks to SanityCheck CI