From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Sutou Kouhei <kou(at)clear-code(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: confusing / inefficient "need_transcoding" handling in copy |
Date: | 2024-02-08 07:05:39 |
Message-ID: | ZcR9Q9hJ8GedFSCd@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 06, 2024 at 02:24:45PM -0800, Andres Freund wrote:
> I think the code is just very confusing - there actually *is* verification of
> the encoding, it just happens at a different, earlier, layer, namely in
> copyfromparse.c: CopyConvertBuf() which says:
> /*
> * If the file and server encoding are the same, no encoding conversion is
> * required. However, we still need to verify that the input is valid for
> * the encoding.
> */
>
> And does indeed verify that.
This has been switched by Heikki in f82de5c46bdf, in 2021, that has
removed pg_database_encoding_max_length() in the COPY FROM case.
(Adding him now in CC, in case he has any comments).
> One unfortunate issue: We don't have any tests verifying that COPY FROM
> catches encoding issues.
Oops.
Anyway, I was looking at the copyto.c code because I need to get
something on this thread to be able to do something about the
pluggable APIs in COPY, and echoing with what you mentioned upthread,
what we only need to do is to set need_transcoding only when the
client and the server encodings are not the same? Am I missing
something?
Runtime gets much better in average, around 1260ms on HEAD vs 1023ms
with the attached for the example of upthread on a single process.
Some profile data from CopyOneRowTo(), if relevant:
* HEAD:
- 82.78% 10.96% postgres postgres [.] CopyOneRowTo
- 71.82% CopyOneRowTo
+ 30.87% OutputFunctionCall
- 13.21% CopyAttributeOutText
pg_server_to_any
- 9.48% appendBinaryStringInfo
4.93% enlargeStringInfo
3.33% 0xffffa4e1e234
+ 3.20% CopySendEndOfRow
2.66% 0xffffa4e1e214
1.02% pgstat_progress_update_param
0.86% memcpy(at)plt
0.74% 0xffffa4e1cba4
0.72% MemoryContextReset
0.72% 0xffffa4e1cba8
0.59% enlargeStringInfo
0.55% 0xffffa4e1cb40
0.54% 0xffffa4e1cb74
0.52% 0xffffa4e1cb8c
+ 10.96% _start
* patch:
- 80.82% 12.25% postgres postgres [.] CopyOneRowTo
- 68.57% CopyOneRowTo
+ 36.55% OutputFunctionCall
11.44% CopyAttributeOutText
+ 8.87% appendBinaryStringInfo
+ 3.79% CopySendEndOfRow
1.01% pgstat_progress_update_param
0.79% int2out
0.66% MemoryContextReset
0.63% 0xffffaa624ba8
0.60% memcpy(at)plt
0.60% enlargeStringInfo
0.53% 0xffffaa624ba4
+ 12.25% _start
That's a performance-only change, but there may be a good argument for
backpatching something, perhaps?
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Speedup-COPY-TO.patch | text/x-diff | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-02-08 07:06:28 | Re: Testing autovacuum wraparound (including failsafe) |
Previous Message | Peter Eisentraut | 2024-02-08 07:01:00 | Re: What about Perl autodie? |