Re: Searching for "bare" letters

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Uwe Schroeder <uwe(at)oss4u(dot)com>
Cc: "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>, pgsql-general(at)postgresql(dot)org
Subject: Re: Searching for "bare" letters
Date: 2011-10-02 09:35:55
Message-ID: Pine.LNX.4.64.1110021333280.26195@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I don't see the problem - you can have a dictionary, which does all work on
recognizing bare letters and output several versions. Have you seen unaccent
dictionary ?

Oleg
On Sun, 2 Oct 2011, Uwe Schroeder wrote:

>> Hi, everyone. Uwe wrote:
>>> What kind of "client" are the users using? I assume you will have some
>>> kind of user interface. For me this is a typical job for a user
>>> interface. The number of letters with "equivalents" in different
>>> languages are extremely limited, so a simple matching routine in the
>>> user interface should give you a way to issue the proper query.
>>
>> The user interface will be via a Web application. But we need to store
>> the data with the European characters, such as ?, so that we can display
>> them appropriately. So much as I like your suggestion, we need to do
>> the opposite of what you're saying -- namely, take a bare letter, and
>> then search for letters with accents and such on them.
>>
>> I am beginning to think that storing two versions of each name, one bare
>> and the other not, might be the easiest way to go. But hey, I'm open
>> to more suggestions.
>>
>> Reuven
>
>
> That still doesn't hinder you from using a matching algorithm. Here a simple
> example (to my understanding of the problem)
> You have texts stored in the db both containing a n and a ?. Now a client
> enters "n" on the website. What you want to do is look for both variations, so
> "n" translates into "n" or "?".
> There you have it. In the routine that receives the request you have a
> matching method that matches on "n" (or any of the few other characters with
> equivalents) and the routine will issue a query with a "xx like "%n%" or xx
> like "%?%" (personally I would use ilike, since that eliminates the case
> problem).
>
> Since you're referring to a "name", I sure don't know the specifics of the
> problem or data layout, but by what I know I think you can tackle this with a
> rather primitive "match -> translate to" kind of algorithm.
>
> One thing I'd not do: store duplicate versions. There's always a way to deal
> with data the way it is. In my opinion storing different versions of the same
> data just bloats a database in favor of a smarter way to deal with the initial
> data.
>
> Uwe
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message r d 2011-10-02 10:41:20 Updating 9.0.4 --> 9.1.1: How best to ???
Previous Message Uwe Schroeder 2011-10-02 08:20:10 Re: Searching for "bare" letters