Re: How can I return a UTF8 string from a hex representation in a latin9 database?

From: "PGUser2020" <pg(at)diorite(dot)uk>
To: "Erik Wienhold" <ewie(at)ewie(dot)name>
Cc: "pgsql-general" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How can I return a UTF8 string from a hex representation in a latin9 database?
Date: 2023-11-20 07:24:45
Message-ID: E1r4yeD-0004Mk-9R@rmmprod06.runbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 2023-11-20 02:59 +00:00 GMT, "Erik Wienhold" <ewie(at)ewie(dot)name> wrote:
> On 2023-11-19 21:37 +0100, PGUser2020 wrote:

>
> Technically speaking, UTF-8 is an 8-bit encoding. But I guess that
> application would then show mojibake if UTF-8 were stored.
>

Yes sorry, I should have said single byte rather than 8 bit. There must be no possibility that a single character occupies more than one byte as the (e.g.) varchar(10) and char(5) fields overflow lengths otherwise.

>
> Do you have to use existing Latin-9 text columns to store UTF-8? If not
> then I'd go with bytea instead of text (varchar) if possible and also
> supported by your client. Otherwise it may be difficult to distinguish
> between "normal" Latin-9 text and the hex- or base64-encoded UTF-8.
> Although bytea could also store anything, not just UTF-8, so you'd have
> to deal with invalid data anyway.
>

I do have to use existing columns yes, and they are varchar latin9 columns.

> Is the same client sending and reading that data? If yes, why can't the
> client do the hex-encoding of the UTF-8 string and only send/read those
> encoded strings so that database won't event see UTF-8? Why must the
> database be involved in this custom encoding scheme instead of just
> storing BLOBs (either as bytea or some encoded text)?
>

So one of the external clients applications which is interacting with this database will do just that -- it will make a hex string from its utf8 input and store that in a varchar

>
> The client can disable encoding conversion by setting client_encoding to
> sql_ascii:
>
> latin9_test=# show server_encoding;
> server_encoding
> -----------------
> LATIN9
> (1 row)
>
> latin9_test=# set client_encoding to sql_ascii;
> SET
> latin9_test=# show client_encoding;
> client_encoding
> -----------------
> SQL_ASCII
> (1 row)
>
> latin9_test=# select convert_from(decode('ceb120ceb220ceb320ceb420ceb520cf83cf84', 'hex'), 'sql_ascii');
> convert_from
> ---------------------
> α β γ δ ε στ
> (1 row)
>
> Maybe that's also an option for your client.
>

It is very useful and exactly what I was looking for thanks.

This technique should allow me to create a login, mask a table with a view containing this decode, and use search_path to get the view returned in preference to the base table.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2023-11-20 10:01:57 Re: pg_basebackup
Previous Message Matthias Apitz 2023-11-20 06:30:17 pg_basebackup