From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | andres(at)anarazel(dot)de |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp |
Subject: | Re: confusing / inefficient "need_transcoding" handling in copy |
Date: | 2024-02-08 08:25:01 |
Message-ID: | 20240208.172501.2177371292839763981.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <20240206222445(dot)hzq22pb2nye7rm67(at)awork3(dot)anarazel(dot)de>
"Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 6 Feb 2024 14:24:45 -0800,
Andres Freund <andres(at)anarazel(dot)de> wrote:
> One unfortunate issue: We don't have any tests verifying that COPY FROM
> catches encoding issues.
How about the attached patch for it?
How about the following to avoid needless transcoding?
----
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index bd4710a79b..80ec26cafe 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -612,13 +612,14 @@ BeginCopyTo(ParseState *pstate,
cstate->file_encoding = cstate->opts.file_encoding;
/*
- * Set up encoding conversion info. Even if the file and server encodings
- * are the same, we must apply pg_any_to_server() to validate data in
- * multibyte encodings.
+ * Set up encoding conversion info. We use pg_server_to_any() for the
+ * conversion. pg_server_to_any() does nothing with an encoding that
+ * equals to GetDatabaseEncoding() and PG_SQL_ASCII. If
+ * cstate->file_encoding equals to GetDatabaseEncoding() and PG_SQL_ASCII,
+ * we don't need to transcode.
*/
- cstate->need_transcoding =
- (cstate->file_encoding != GetDatabaseEncoding() ||
- pg_database_encoding_max_length() > 1);
+ cstate->need_transcoding = !(cstate->file_encoding == GetDatabaseEncoding() ||
+ cstate->file_encoding == PG_SQL_ASCII);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
----
Numbers on my environment:
master: 861.646ms
patched: 697.547ms
SQL:
COPY (SELECT 1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2, generate_series(1, 1000000::int4)) TO '/dev/null' \watch c=5
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-a-test-for-invalid-encoding-for-COPY-FROM.patch | text/x-patch | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-02-08 08:25:07 | Re: confusing / inefficient "need_transcoding" handling in copy |
Previous Message | Daniel Gustafsson | 2024-02-08 08:13:10 | Re: What about Perl autodie? |