Quick Links

Re: Initial ugly reverse-translator

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	PgSQL General ML <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Initial ugly reverse-translator
Date:	2008-04-19 16:38:13
Message-ID:	10234.1208623093@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> Tom Lane wrote:
>> I don't really see the problem. I assume from your reference to pg_trgm
>> that you're using trigram similarity as the prefilter for potential
>> matches

> It turns out that's no good anyway, as it appears to ignore characters
> outside the ASCII range. Rather less than useful for searching a
> database of translated strings ;-)

A quick look at the pg_trgm code suggests that it is only prepared to
deal with single-byte encodings; if you're working in UTF8, which I
suppose you'd have to be, it's dead in the water :-(. Perhaps fixing
that should be on the TODO list.

But in any case maybe the full-text-search stuff would be more useful
as a prefilter? Although honestly, for the speed we need here, I'm
not sure a prefilter is needed at all. Full text might be useful
if a LIKE-based match fails, though.

>> (And besides, speed doesn't seem like the be-all and end-all here.)

> True. It's not so much the speed as the fragility when faced with small
> changes to formatting. In addition to whitespace, some clients mangle
> punctuation with features like automatic "curly"-quoting.

Yeah. I was wondering whether encoding differences wouldn't be a huge
problem in practice, as well.

regards, tom lane

In response to

Re: Initial ugly reverse-translator at 2008-04-19 16:04:22 from Craig Ringer

Responses

Re: Initial ugly reverse-translator at 2008-04-19 17:10:38 from Oleg Bartunov
Re: Initial ugly reverse-translator at 2008-04-19 18:16:19 from Craig Ringer

Browse pgsql-general by date

	From	Date	Subject
Next Message	Oleg Bartunov	2008-04-19 17:10:38	Re: Initial ugly reverse-translator
Previous Message	Craig Ringer	2008-04-19 16:04:22	Re: Initial ugly reverse-translator