Re: confusing / inefficient "need_transcoding" handling in copy

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: confusing / inefficient "need_transcoding" handling in copy
Date: 2024-12-12 06:25:41
Message-ID: 20241212.152541.1227846217843897891.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <Z1fKrTkT-eIVAK7F(at)paquier(dot)xyz>
"Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 10 Dec 2024 13:59:25 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> client_encoding would be used by COPY when not specifying ENCODING
> option. Perhaps more tests should be added with this value specified
> by a SET client_encoding?

It makes sense. I missed the case. I've added the case to
the v3 patch.

> Another one would be valid conversions back and forth. For example,
> I recall that LATIN1 accepts any bytes and can apply a conversion to
> UTF-8, so we could use it and expand a bit more the proposed tests?
> Or something like that?

OK. I've added valid cases too by using LATIN1 as you
suggested.

> This is not going to be portable across the buildfarm. Two reasons
> are spotted by the CI (there may be others):
> 1) For Windows, as in the following regression.diffs:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: character with byte sequence 0xe3 0x81 0x82 in encoding "UTF8" has no equivalent in encoding "WIN1252"
> 2) Second failure on Linux, with 32-bit builds:
> COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
> +ERROR: conversion between UTF8 and SQL_ASCII is not supported
>
> Likely, this should be made conditional, based on the fact that the
> database needs to be able to support utf8? There are a couple of
> examples like that in the tree, based on the following SQL trick:
> SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset
> \if :skip_test
> \quit
> \endif

Thanks. I didn't notice the portability problem. I've added
the skip trick.

> This requires an alternate output for the non-utf8 case.

Oh! I didn't know the "XXX_1.out" feature.

Thanks,
--
kou

Attachment Content-Type Size
v3-0001-Add-tests-for-invalid-encoding-for-COPY-FROM.patch text/x-patch 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-12-12 07:16:47 Re: Logical replication timeout
Previous Message Ashutosh Bapat 2024-12-12 06:04:47 Re: Difference in dump from original and restored database due to NOT NULL constraints on children