Re: problem with query

From: Giuseppe Broccolo <giuseppe(dot)broccolo(at)2ndquadrant(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: problem with query
Date: 2013-09-13 14:49:55
Message-ID: 52332613.2020904@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Il 12/09/2013 22:34, Roberto Scattini ha scritto:
> hi, today we discovered that this query doesn't return the expected
> values:
>
> SELECT DISTINCT
> p.id <http://p.id>, p.apellido AS "Apellido", p.nombre AS "Nombre",
> pf.nombre AS "Funcion", to_char(da.f_ingreso_pg, 'dd/mm/yyyy') AS
> "Fecha Ingreso PG", e.estado AS "Estado", to_char(pe.fecha,
> 'dd/mm/yyyy') AS "Fecha Estado"
> FROM personal.personas AS p
> LEFT JOIN personal.personas_da_pg AS da ON p.id
> <http://p.id>=da.id_persona
> LEFT JOIN personal.personas_estado AS pe ON da.estado_id=pe.id
> <http://pe.id>
> LEFT JOIN personal.estados AS e ON pe.id_estado=e.id <http://e.id>
> LEFT JOIN procu_departamento AS d ON d.id
> <http://d.id>=da.id_departamento
> LEFT JOIN procu_departamento AS dto ON left(d.c_organigrama,
> 4)||'000'=dto.c_organigrama
> LEFT JOIN procu_departamento AS dir ON left(d.c_organigrama,
> 3)||'0000'=dir.c_organigrama
> LEFT JOIN procu_departamento AS dg ON left(d.c_organigrama,
> 2)||'00000'=dg.c_organigrama
> LEFT JOIN personal.funciones AS pf ON pf.id <http://pf.id>=da.funcion_id
> LEFT JOIN personal.profesiones AS pp ON pp.id
> <http://pp.id>=p.id_profesion
> WHERE p.apellido ilike '%nuñez%'
> ORDER BY "Apellido"
The problem is the encoding: SQL_ASCII encodes only the first 128
characters, so 'ñ' and 'Ñ' cannot be encoded in ASCII. If you insert
text 'ñ' or 'Ñ' in a table inside a database (e.g. 'ascii_test') with
SQL_ASCII encoding, they will be seen as two distinct characters:

ascii_test=# SELECT length(E'ñ');
length
--------
2
(1 row)
ascii_test=# SELECT length(E'Ñ');
length
--------
2
(1 row)
ascii_test=# SELECT 'ñ'::bytea;
bytea
--------
\xc3b1
(1 row)
ascii_test=# SELECT 'Ñ'::bytea;
bytea
--------
\xc391
(1 row)

Hexadecimals 'c3', 'b1' and '91' does not decode anything in ASCII, even
if terminal show pairs as 'ñ' or 'Ñ'. So ILIKE operator cannot apply
case insensitive pattern matching to data which does not encode any
string (in the SQL_ASCII encoding) and works as a normal LIKE.

Even if the client which insert data has 8-bit encoding (UTF8, where 'ñ'
and 'Ñ' are decoded), SQL_ASCII database cannot convert strings anyway,
and problem remains.

The best way is to work with a UTF8 encoded database. Is there a
particular reason to work with SQL_ASCII encoding?

Giuseppe.

--
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe(dot)broccolo(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Roberto Scattini 2013-09-13 14:57:01 Re: problem with query
Previous Message Chris Travers 2013-09-13 14:30:40 Re: Best way to populate nested composite type from JSON`