Quick Links

Re: confusing / inefficient "need_transcoding" handling in copy

From:	Sutou Kouhei <kou(at)clear-code(dot)com>
To:	michael(at)paquier(dot)xyz
Cc:	andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject:	Re: confusing / inefficient "need_transcoding" handling in copy
Date:	2024-12-12 06:25:41
Message-ID:	20241212.152541.1227846217843897891.kou@clear-code.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In <Z1fKrTkT-eIVAK7F(at)paquier(dot)xyz>
"Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 10 Dec 2024 13:59:25 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> client_encoding would be used by COPY when not specifying ENCODING
> option. Perhaps more tests should be added with this value specified
> by a SET client_encoding?

It makes sense. I missed the case. I've added the case to
the v3 patch.

> Another one would be valid conversions back and forth. For example,
> I recall that LATIN1 accepts any bytes and can apply a conversion to
> UTF-8, so we could use it and expand a bit more the proposed tests?
> Or something like that?

OK. I've added valid cases too by using LATIN1 as you
suggested.

> This is not going to be portable across the buildfarm. Two reasons
> are spotted by the CI (there may be others):
> 1) For Windows, as in the following regression.diffs:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: character with byte sequence 0xe3 0x81 0x82 in encoding "UTF8" has no equivalent in encoding "WIN1252"
> 2) Second failure on Linux, with 32-bit builds:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: conversion between UTF8 and SQL_ASCII is not supported
>
> Likely, this should be made conditional, based on the fact that the
> database needs to be able to support utf8? There are a couple of
> examples like that in the tree, based on the following SQL trick:
> SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset
> \if :skip_test
> \quit
> \endif

Thanks. I didn't notice the portability problem. I've added
the skip trick.

> This requires an alternate output for the non-utf8 case.

Oh! I didn't know the "XXX_1.out" feature.

Thanks,
--
kou

Attachment	Content-Type	Size
v3-0001-Add-tests-for-invalid-encoding-for-COPY-FROM.patch	text/x-patch	3.6 KB

In response to

Re: confusing / inefficient "need_transcoding" handling in copy at 2024-12-10 04:59:25 from Michael Paquier

Responses

Re: confusing / inefficient "need_transcoding" handling in copy at 2024-12-13 03:03:45 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Shlok Kyal	2024-12-12 07:16:47	Re: Logical replication timeout
Previous Message	Ashutosh Bapat	2024-12-12 06:04:47	Re: Difference in dump from original and restored database due to NOT NULL constraints on children