From: | Christoph Moench-Tegeder <cmt(at)burggraben(dot)net> |
---|---|
To: | Matthias Apitz <guru(at)unixarea(dot)de> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server |
Date: | 2019-10-12 13:14:24 |
Message-ID: | 20191012131424.GA2452@elch.exwg.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
## Matthias Apitz (guru(at)unixarea(dot)de):
> but when I now fetch the first row with:
>
> @row = $sth->fetchrow_array;
> $HexStr = unpack("H*", $row[0]);
> print "HexStr: " . $HexStr . "\n";
> print "$row[0]\n";
>
> The resulting column contains ISO data:
As expected: https://perldoc.perl.org/perluniintro.html
Specifically, if all code points in the string are 0xFF or less, Perl
uses the native eight-bit character set.
> P<E4>dagogische Hochschule Weingarten
And then it doesn't know that your terminal expects UTF-8 (perl
just dumps the binary string here), because you didn't tell it:
"binmode(STDOUT, ':encoding(utf8)')" would fix that.
See: https://perldoc.perl.org/perlunifaq.html specifically "What if I
don't decode?", "What if I don't encode?" and "Is there a way to
automatically decode or encode?".
The whole PostgreSQL-DBI-UTF8-thingy is working: use "Tijl Müller"
as test data (that's the dutch "ij"-digraph in there, a character
decidedly not in "latin-9/15" and therefore not "0xFF or less").
That will break your "unpack('H*')" - it tries to unpack that wide
character into a hex byte and "Character in 'H' format wrapped in
unpack". Use "print(join(' ', unpack('U*', $row[0])))" to see that
the ij has codepoint 307 (decimal).
Regards,
Christoph
--
Spare Space
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Verite | 2019-10-12 13:17:55 | Re: Case Insensitive Comparison with Postgres 12 |
Previous Message | Andreas Joseph Krogh | 2019-10-12 11:36:24 | Re: Segmentation fault with PG-12 |