From: | hamann(dot)w(at)t-online(dot)de |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Searching for "bare" letters |
Date: | 2011-10-02 07:31:18 |
Message-ID: | wolfgang-1111002093118.A0115664@amadeus3.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Reuven M. Lerner wrote:
>> <p>Hi, everyone. I'm working on a project on PostgreSQL 9.0 (soon
>> to be upgraded to 9.1, given that we haven't yet launched). The
>> project will involve numerous text fields containing English,
>> Spanish, and Portuguese. Some of those text fields will be
>> searchable by the user. That's easy enough to do; for our
>> purposes, I was planning to use some combination of LIKE searches;
>> the database is small enough that this doesn't take very much
>> time, and we don't expect the number of searchable records (or
>> columns within those records) to be all that large.</p>
>> <p>The thing is, the people running the site want searches to work
>> on what I'm calling (for lack of a better term) "bare" letters.
>> That is, if the user searches for "n", then the search should also
>> match Spanish words containing "ñ". I'm told by Spanish-speaking
>> members of the team that this is how they would expect searches to
>> work. However, when I just did a quick test using a UTF-8 encoded
>> 9.0 database, I found that PostgreSQL didn't see the two
>> characters as identical. (I must say, this is the behavior that I
>> would have expected, had the Spanish-speaking team member not said
>> anything on the subject.)</p>
>> <p>So my question is whether I can somehow wrangle PostgreSQL into
>> thinking that "n" and "ñ" are the same character for search
>> purposes, or if I need to do something else -- use regexps, keep a
>> "naked," searchable version of each column alongside the native
>> one, or something else entirely -- to get this to work.</p>
>> <p>Any ideas?</p>
>> <p>Thanks,</p>
>> <p>Reuven<br>
I had the same problem with german (there is ä ö ü)
I ended up with a normalized version of the database (for many purposes, this could
be just an extra column) plus preprocessing the input.
There is one difficulty with german searches: these letters are commonly transliterated into
ue etc, like in "Muenchen". So depending on culture, some people would expect a "u" search
term to match, and others the "ue". So preprocessing query means replacing bare u
(not followed by e) with a ue? regex
BTW: if your search form does not explicitly tell the browser to use utf8 to encode the search field,
you might expect a small proportion of iso-latin1 requests
Regards
Wolfgang
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2011-10-02 07:45:59 | Re: SQL Help - Finding Next Lowest Value of Current Row Value |
Previous Message | Reuven M. Lerner | 2011-10-02 07:03:19 | Re: Searching for "bare" letters |