From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp |
Subject: | Re: confusing / inefficient "need_transcoding" handling in copy |
Date: | 2024-12-12 06:25:41 |
Message-ID: | 20241212.152541.1227846217843897891.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <Z1fKrTkT-eIVAK7F(at)paquier(dot)xyz>
"Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 10 Dec 2024 13:59:25 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> client_encoding would be used by COPY when not specifying ENCODING
> option. Perhaps more tests should be added with this value specified
> by a SET client_encoding?
It makes sense. I missed the case. I've added the case to
the v3 patch.
> Another one would be valid conversions back and forth. For example,
> I recall that LATIN1 accepts any bytes and can apply a conversion to
> UTF-8, so we could use it and expand a bit more the proposed tests?
> Or something like that?
OK. I've added valid cases too by using LATIN1 as you
suggested.
> This is not going to be portable across the buildfarm. Two reasons
> are spotted by the CI (there may be others):
> 1) For Windows, as in the following regression.diffs:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: character with byte sequence 0xe3 0x81 0x82 in encoding "UTF8" has no equivalent in encoding "WIN1252"
> 2) Second failure on Linux, with 32-bit builds:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: conversion between UTF8 and SQL_ASCII is not supported
>
> Likely, this should be made conditional, based on the fact that the
> database needs to be able to support utf8? There are a couple of
> examples like that in the tree, based on the following SQL trick:
> SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset
> \if :skip_test
> \quit
> \endif
Thanks. I didn't notice the portability problem. I've added
the skip trick.
> This requires an alternate output for the non-utf8 case.
Oh! I didn't know the "XXX_1.out" feature.
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Add-tests-for-invalid-encoding-for-COPY-FROM.patch | text/x-patch | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Shlok Kyal | 2024-12-12 07:16:47 | Re: Logical replication timeout |
Previous Message | Ashutosh Bapat | 2024-12-12 06:04:47 | Re: Difference in dump from original and restored database due to NOT NULL constraints on children |