Re: Initial ugly reverse-translator

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "pepone(dot)onrez" <pepone(dot)onrez(at)gmail(dot)com>
Cc: PgSQL General ML <pgsql-general(at)postgresql(dot)org>
Subject: Re: Initial ugly reverse-translator
Date: 2009-01-16 05:18:05
Message-ID: Pine.LNX.4.64.0901160816450.9554@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

ltree and pg_trgm with UTF8 support are available from CVS HEAD, see
See http://archives.postgresql.org/pgsql-committers/2008-06/msg00356.php
http://archives.postgresql.org/pgsql-committers/2008-11/msg00139.php

Oleg
On Fri, 16 Jan 2009, pepone.onrez wrote:

> On Sat, Apr 19, 2008 at 6:10 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>> On Sat, 19 Apr 2008, Tom Lane wrote:
>>
>>> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
>>>>
>>>> Tom Lane wrote:
>>>>>
>>>>> I don't really see the problem. I assume from your reference to pg_trgm
>>>>> that you're using trigram similarity as the prefilter for potential
>>>>> matches
>>>
>>>> It turns out that's no good anyway, as it appears to ignore characters
>>>> outside the ASCII range. Rather less than useful for searching a
>>>> database of translated strings ;-)
>>>
>>> A quick look at the pg_trgm code suggests that it is only prepared to
>>> deal with single-byte encodings; if you're working in UTF8, which I
>>> suppose you'd have to be, it's dead in the water :-(. Perhaps fixing
>>> that should be on the TODO list.
>>
>> as well as ltree. they are in our todo list:
>> http://www.sai.msu.su/~megera/wiki/TODO
>>
>
> Hi Oleg
>
> In your TODO list says that UTF8 was added to ltree, is this code
> currently available for download?
>
> Regards,
> JosЪЪ
>>>
>>> But in any case maybe the full-text-search stuff would be more useful
>>> as a prefilter? Although honestly, for the speed we need here, I'm
>>> not sure a prefilter is needed at all. Full text might be useful
>>> if a LIKE-based match fails, though.
>>>
>>>>> (And besides, speed doesn't seem like the be-all and end-all here.)
>>>
>>>> True. It's not so much the speed as the fragility when faced with small
>>>> changes to formatting. In addition to whitespace, some clients mangle
>>>> punctuation with features like automatic "curly"-quoting.
>>>
>>> Yeah. I was wondering whether encoding differences wouldn't be a huge
>>> problem in practice, as well.
>>>
>>> regards, tom lane
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2009-01-16 06:04:29 Re: Query sometimes takes down server
Previous Message Dhaval Shah 2009-01-16 02:18:12 Question regarding Postgres + OpenSSL + FIPs