unicode searches failing that use % and LIKE operators

From: Benjamin Weaver <benjamin(dot)weaver(at)classics(dot)ox(dot)ac(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: unicode searches failing that use % and LIKE operators
Date: 2007-10-22 18:00:38
Message-ID: 20071022180038.CE858EB04D@webmail221.herald.ox.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear all,

I have the following problem: a compound search, involving 2 wildcarded
character search terms, in which one search term consists of Latin characters
and the other, of UTF-8 unicode Greek characters, fails. This is strange,
because similar searches in which both terms are either unicode Greek or Latin
characters succeed.

Both terms query a column of type text. Searches of this kind fail both via
JDBC and the PSQL interface via an xterm window (which handles unicode
properly). The JDBC search will have put both terms in UTF-8. The column is
called metadatafulltext. The search uses the LIKE operator in conjunction with
the wildcard character:

For example, the following search fails:

SELECT ..

FROM metadatafulltext...

WHERE metadatafulltext LIKE '%Jones%' AND metadatafulltext LIKE '%ALPHABETA%'
(where ALPHABETA is actually a unicode Greek string (\u03b1\u03b2).

whereas searches using all Greek characters succeed:

WHERE metadatafulltext LIKE '%BETAEPSILONDELTA%' AND metadatafulltext LIKE
'%ALPHABETA%'

and equally, all-Latin searches also succeed:
WHERE metadatafulltext LIKE '%Jones%' AND metadatafulltext LIKE '%Smith%'

What must I do to ensure that mixed-term searches of the first kind succeed?

Thanks in advance,

Ben Weaver

--
Benjamin Weaver
Faculty Research Associate, Imaging Papyri Projects, Herculaneum Society, Oxford
email: benjamin(dot)weaver(at)classics(dot)ox(dot)ac(dot)uk
phone: (0)1865 610236

Responses

Browse pgsql-general by date

  From Date Subject
Next Message D. Dante Lorenso 2007-10-22 18:02:55 Problem with BYTEA, CAST, and pg_dump
Previous Message Martin Marques 2007-10-22 17:22:00 Bitmap Heap scan 8.1/8.2