Re: inserts bypass encoding conversion

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "James Pang (chaolpan)" <chaolpan(at)cisco(dot)com>
Cc: "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: inserts bypass encoding conversion
Date: 2023-08-17 02:40:32
Message-ID: 1727535.1692240032@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

"James Pang (chaolpan)" <chaolpan(at)cisco(dot)com> writes:
> So, insert into values(chr(226)||chr(128)||chr(166)) actually got stored in database with LATIN1 with single byte sequence, but when query select * from testutf8, it got converted to UTF8 three byte sequence first ?

There are no LATIN1 characters that have longer than 2-byte UTF8
representations, so no.

I think your fundamental misunderstanding is supposing that this:

chr(226)||chr(128)||chr(166)

produces something equivalent to the UTF8 sequence 0xe2 0x80 0xa6.
It will not, no matter which server encoding you are dealing with.
It will produce something that is three separate characters
according to the server encoding. In LATIN1, that could well be
the byte sequence 0xe2 0x80 0xa6, but *that byte sequence does not
mean the same thing that it would mean in UTF8 encoding*.

You also seem not to grasp the fact that an encoding conversion
will happen between your client and the server if client_encoding
is different from server_encoding. Because of that, the output of
a SELECT command doesn't prove much of anything here.

regards, tom lane

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rajesh Kumar 2023-08-17 19:06:09 Autovacuum not working peoperly
Previous Message James Pang (chaolpan) 2023-08-17 02:25:57 RE: inserts bypass encoding conversion