From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-03-27 03:28:40 |
Message-ID: | 20250327.122840.991624455773269772.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoAfWrjpTDJ0garVUoXY0WC3Ud4Cu51q+ccWiotm1uo_2A(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 23 Mar 2025 02:01:59 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> ---
> +/*
> + * Process the "format" option.
> + *
> + * This function checks whether the option value is a built-in format such as
> + * "text" and "csv" or not. If the option value isn't a built-in format, this
> + * function finds a COPY format handler that returns a CopyToRoutine (for
> + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
> + * format handler is found, this function reports an error.
> + */
>
> I think this comment needs to be updated as the part "If the option
> value isn't ..." is no longer true.
>
> I think we don't necessarily need to create a separate function
> ProcessCopyOptionFormat for processing the format option.
Hmm. I think that this separated function will increase
readability by reducing indentation. But I've removed the
separation as you suggested. So the comment is also removed
entirely.
0002 includes this.
> We need more regression tests for handling the given format name. For example,
>
> - more various input patterns.
> - a function with the specified format name exists but it returns an
> unexpected Node.
> - looking for a handler function in a different namespace.
> etc.
I've added the following tests:
* Wrong input type handler without namespace
* Wrong input type handler with namespace
* Wrong return type handler without namespace
* Wrong return type handler with namespace
* Wrong return value (Copy*Routine isn't returned) handler without namespace
* Wrong return value (Copy*Routine isn't returned) handler with namespace
* Nonexistent handler
* Invalid qualified name
* Valid handler without namespace and without search_path
* Valid handler without namespace and with search_path
* Valid handler with namespace
0002 also includes this.
> I think that we should accept qualified names too as the format name
> like tablesample does. That way, different extensions implementing the
> same format can be used.
Implemented. It's implemented after parsing SQL. Is it OK?
(It seems that tablesample does it in parsing SQL.)
Because "WITH (FORMAT XXX)" is processed as a generic option
in gram.y. All generic options are processed as strings. So
I keep this.
Syntax is "COPY ... WITH (FORMAT 'NAMESPACE.HANDLER_NAME')"
not "COPY ... WITH (FORMAT 'NAMESPACE'.'HANDLER_NAME')"
because of this choice.
0002 also includes this.
> ---
> + if (routine == NULL || !IsA(routine, CopyFromRoutine))
> + ereport(
> + ERROR,
> + (errcode(
> +
> ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("COPY handler function "
> + "%u did not return "
> + "CopyFromRoutine struct",
> + opts->handler)));
>
> It's not conventional to put a new line between 'ereport(' and 'ERROR'
> (similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need
> to split the error message into multiple lines as it's not long.
Oh, sorry. I can't remember why I used this... I think I
trusted pgindent...
> ---
> + descr => 'pseudo-type for the result of a copy to/from method function',
>
> s/method function/format function/
Good catch. I used "handler function" not "format function"
because we use "handler" in other places.
> ---
> + Oid handler; /* handler
> function for custom format routine */
>
> 'handler' is used also for built-in formats.
Updated in 0004.
> ---
> +static void
> +CopyFromInFunc(CopyFromState cstate, Oid atttypid,
> + FmgrInfo *finfo, Oid *typioparam)
> +{
> + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
> +}
>
> OIDs could be changed across major versions even for built-in types. I
> think it's better to avoid using it for tests.
Oh, I didn't know it. I've changed to use type name instead
of OID. It'll be more stable than OID.
> ---
> +static void
> +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
> slot->tts_nvalid)));
> +}
>
> Similar to the above comment, the field name 'tts_nvalid' might also
> be changed in the future, let's use another name.
Hmm. If the field name is changed, we need to change this
code. So changing tests too isn't strange. Anyway, I used
more generic text.
> ---
> +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> + .type = T_CopyFromRoutine,
> + .CopyFromInFunc = CopyFromInFunc,
> + .CopyFromStart = CopyFromStart,
> + .CopyFromOneRow = CopyFromOneRow,
> + .CopyFromEnd = CopyFromEnd,
> +};
>
> I'd suggest not using the same function names as the fields.
OK. I've added "Test" prefix.
> ---
> +/*
> + * Export CopySendEndOfRow() for extensions. We want to keep
> + * CopySendEndOfRow() as a static function for
> + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
> + * compiler.
> + */
> +void
> +CopyToStateFlush(CopyToState cstate)
> +{
> + CopySendEndOfRow(cstate);
> +}
>
> Is there any reason to use a different name for public functions?
In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
public APIs for custom COPY FORMAT handler extensions. It
will help understanding related APIs. Is it strange in
PostgreSQL?
> ---
> + /* For custom format implementation */
> + void *opaque; /* private space */
>
> How about renaming 'private'?
We should not use "private" because it's a keyword in
C++. If we use "private" here, we can't include this file
from C++ code.
> ---
> I've not reviewed the documentation patch yet but I think the patch
> seems to miss the updates to the description of the FORMAT option in
> the COPY command section.
I defer this for now. We can revisit the last documentation
patch after we finalize our API. (Or could someone help us?)
> I think we can reorganize the patch set as follows:
>
> 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
> COPY_DEST_XXX accordingly.
> 2. Support custom format for both COPY TO and COPY FROM.
> 3. Expose necessary helper functions such as CopySendEndOfRow().
> 4. Add CopyFromSkipErrorRow().
> 5. Documentation.
The attached v39 patch set uses the followings:
0001: Create copyto_internal.h and change COPY_XXX to
COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
(Same as 1. in your suggestion)
0002: Support custom format for both COPY TO and COPY FROM.
(Same as 2. in your suggestion)
0003: Expose necessary helper functions such as CopySendEndOfRow()
and add CopyFromSkipErrorRow().
(3. + 4. in your suggestion)
0004: Define handler functions for built-in formats.
(Not included in your suggestion)
0005: Documentation. (WIP)
(Same as 5. in your suggestion)
We can merge 0001 quickly, right?
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
v39-0001-Export-CopyDest-as-private-data.patch | text/x-patch | 7.8 KB |
v39-0002-Add-support-for-adding-custom-COPY-format.patch | text/x-patch | 37.8 KB |
v39-0003-Add-support-for-implementing-custom-COPY-handler.patch | text/x-patch | 14.7 KB |
v39-0004-Use-copy-handlers-for-built-in-formats.patch | text/x-patch | 17.7 KB |
v39-0005-Add-document-how-to-write-a-COPY-handler.patch | text/x-patch | 14.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | wenhui qiu | 2025-03-27 03:34:31 | Re: GSoC 2025 - Looking for Beginner-Friendly PostgreSQL Project |
Previous Message | Tom Lane | 2025-03-27 03:17:17 | Re: Add Postgres module info |