Re: LATIN9 - hex in varchar after convert

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: steve(at)tusol(dot)co(dot)uk, PostGreSQL <pgsql-novice(at)postgresql(dot)org>
Subject: Re: LATIN9 - hex in varchar after convert
Date: 2020-04-26 02:24:39
Message-ID: 85be3e3e23ec52df22c699af0c5eec022f1ceb51.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:
> I have a table with a varchar(5000) that contains general text. The table is typically
> maintained via a GUI, but on this occasion I received a spreadsheet with data and
> loaded it - via copy - from a csv extracted from that. The data looked fine in psql,
> but when looking at the data in the GUI, characters such as single quote marks (')
> appeared as a series of special characters. I assumed that the spreadsheet then had
> some different encoding (UTF8?) and that I then needed to 'translate' the characters.

Very likely, the characters were not really single quotes, but "curly quotes"
(UNICODE 201C and 201E) characters.

One of the following scenarios must have taken place:

1. The file was encoded in UTF-8, but when you copied the data in, the encoding
you specified (or had by default) was a single-gyte encoding like LATIN9.

The curly quotes are more than one byte in UTF-8, but each byte was interpreted as
a LATIN9 character.

The solution would be to specify ENCODING 'UTF8' with COPY.

2. The characters are actually fine in the database, and you loaded them correctly,
and your database client encoding is UTF8, but your terminal is in LATIN9.

The characters were displayed correctly, but your terminal interpreted each
byte as a character.

To determine which was the case, look what bytes are in the database:

SELECT badcol, badcol::bytea FROM tab WHERE id = 12345;

Yours,
Laurenz Albe

--
Cybertec | https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Steve Tucknott (TuSol) 2020-04-26 08:02:27 Re: LATIN9 - hex in varchar after convert
Previous Message Steve Tucknott (TuSol) 2020-04-25 15:25:38 Re: LATIN9 - hex in varchar after convert