From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jim Finnerty" <jfinnert(at)amazon(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: insensitive collations |
Date: | 2021-04-03 20:23:45 |
Message-ID: | 3edec684-85a8-40b7-a47c-16c54d6eb54f@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jim Finnerty wrote:
> SET client_encoding = WIN1252;
> [...]
> postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__'; --
> the wildcard is applied byte by byte instead of character by character, so
> the 2-byte accented character is matched only by 2 '_'s
> location
> ----------------
> Franche-Comté
> (1 row)
The most plausible explanation is that the client-side text is encoded
in UTF-8, rather than WIN1252 as declared.
If you added length('Franche-Comté') to the above query, I suspect
it would tell that the string is one character longer than
expected, and octet_length('Franche-Comté') would be
two-byte longer than expected.
Also dumping the contents of the "location" column with
convert_to() would show that the accents have been
wrongly translated, if the explanation of the encoding snafu is
correct.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-04-03 21:43:57 | Re: Confusing behavior of psql's \e |
Previous Message | Tom Lane | 2021-04-03 20:06:38 | Re: SP-GiST confusion: indexed column's type vs. index column type |