From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2024-02-05 09:05:15 |
Message-ID: | 20240205.180515.746623205615816813.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <ZcCKwAeFrlOqPBuN(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 5 Feb 2024 16:14:08 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.
Thanks!
> As this is called within the OneRow routine, I can live with that. If
> there is an opposition to that, we could just attach it within the
> routines.
I don't object the approach.
> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments. There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.
I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.
> - No need for plugins to think about how to allocate this data. v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.
Oh, sorry. I missed it when I moved them.
> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback. Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that. This makes the text and
> CSV implementations much closer to each other, as a result.
Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.
We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.
So I don't object removing the postpare callback.
> Let me know if you have
> comments about all that.
Here are some comments for the patch:
+ /*
+ * Called when COPY FROM is started to set up the input functions
+ * associated to the relation's attributes writing to. `fmgr_info` can be
fmgr_info ->
finfo
+ * optionally filled to provide the catalog information of the input
+ * function. `typioparam` can be optinally filled to define the OID of
optinally ->
optionally
+ * the type to pass to the input function. `atttypid` is the OID of data
+ * type used by the relation's attribute.
+ */
+ void (*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+ Oid *typioparam);
How about passing CopyFromState cstate too like other
callbacks for consistency?
+ /*
+ * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+ *
+ * 'econtext' is used to evaluate default expression for each column that
+ * is either not read from the file or is using the DEFAULT option of COPY
or is ->
or
(I'm not sure...)
+ * FROM. It is NULL if no default values are used.
+ *
+ * Returns false if there are no more tuples to copy.
+ */
+ bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+typedef struct CopyToRoutine
+{
+ /*
+ * Called when COPY TO is started to set up the output functions
+ * associated to the relation's attributes reading from. `fmgr_info` can
fmgr_info ->
finfo
+ * be optionally filled. `atttypid` is the OID of data type used by the
+ * relation's attribute.
+ */
+ void (*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);
How about passing CopyToState cstate too like other
callbacks for consistency?
@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
extern int CopyReadAttributesCSV(CopyFromState cstate);
extern int CopyReadAttributesText(CopyFromState cstate);
+/* Callbacks for CopyFromRoutine->OneRow */
CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+
#endif /* COPYFROM_INTERNAL_H */
+/*
+ * CopyFromTextStart
CopyFromTextStart ->
CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+ /* Read and verify binary header */
+ ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromTextEnd
CopyFromTextEnd ->
CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+ /* nothing to do */
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 91433d439b..d02a7773e3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -473,6 +473,7 @@ ConvertRowtypeExpr
CookedConstraint
CopyDest
CopyFormatOptions
+CopyFromRoutine
CopyFromState
CopyFromStateData
CopyHeaderChoice
@@ -482,6 +483,7 @@ CopyMultiInsertInfo
CopyOnErrorChoice
CopySource
CopyStmt
+CopyToRoutine
CopyToState
CopyToStateData
Cost
Wow! I didn't know that we need to update typedefs.list when
I add a "typedef struct".
Thanks,
--
kou
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-02-05 09:45:50 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Graham Leggett | 2024-02-05 08:51:50 | Re: Grant read-only access to exactly one database amongst many |