Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Christoph Moench-Tegeder" <cmt(at)burggraben(dot)net>
Cc: "Matthias Apitz" <guru(at)unixarea(dot)de>,pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server
Date: 2019-10-12 13:52:27
Message-ID: 69f69a9e-78eb-4448-b820-533b581c8677@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Christoph Moench-Tegeder wrote:

> And then it doesn't know that your terminal expects UTF-8 (perl
> just dumps the binary string here), because you didn't tell it:
> "binmode(STDOUT, ':encoding(utf8)')" would fix that.

Or use perl -C, so that it gets that from the environment.

From https://perldoc.perl.org/perlrun.html :

-C on its own (not followed by any number or option list), or the
empty string "" for the PERL_UNICODE environment variable, has the
same effect as -CSDL. In other words, the standard I/O handles and
the default open() layer are UTF-8-fied but only if the locale
environment variables indicate a UTF-8 locale.

Now for what the OP is doing, I'd suggest to use Dump() from the
Devel::Peek module instead of print.

To see the difference between a literal "ä" and "\xc3\xa4" from the
point of view of Perl:

use Devel::Peek;
use utf8;

$str = "\xc3\xa4";
Dump($str);

$str = "ä";
Dump($str);

Result:

SV = PV(0x55af63beeda0) at 0x55af63c185d0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x55af63c3c230 "\303\244"\0
CUR = 2
LEN = 10
COW_REFCNT = 1
SV = PV(0x55af63beeda0) at 0x55af63c185d0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x55af63c58dc0 "\303\244"\0 [UTF8 "\x{e4}"]
CUR = 2
LEN = 10
COW_REFCNT = 1

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Abraham, Danny 2019-10-12 15:21:26 day interval
Previous Message Christoph Moench-Tegeder 2019-10-12 13:27:13 Re: SELECT d02name::bytea FROM ... && DBI::Pg