Re: Initial ugly reverse-translator

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PgSQL General ML <pgsql-general(at)postgresql(dot)org>
Subject: Re: Initial ugly reverse-translator
Date: 2008-04-19 16:04:22
Message-ID: 480A1806.2070006@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:

> I don't really see the problem. I assume from your reference to pg_trgm
> that you're using trigram similarity as the prefilter for potential
> matches

It turns out that's no good anyway, as it appears to ignore characters
outside the ASCII range. Rather less than useful for searching a
database of translated strings ;-)

> so a slow final LIKE match shouldn't be an issue really.
> (And besides, speed doesn't seem like the be-all and end-all here.)

True. It's not so much the speed as the fragility when faced with small
changes to formatting. In addition to whitespace, some clients mangle
punctuation with features like automatic "curly"-quoting.

> AFAICS you just need to translate %-string format escapes to %, quote
> any other % or _, and away you go.
>
> One thing that might be worth doing is avoiding spacing sensitivity,
> since whitespace is frequently mangled in copy-and-paste. Perhaps
> strip all spaces from both strings before matching?

Yep, that sounds pretty reasonable. As usual I'm making things more
complicated than they need to be. I suspect it'll be necessary to strip
quotes and some other punctuation too, but that's not a big deal.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-04-19 16:38:13 Re: Initial ugly reverse-translator
Previous Message Tom Lane 2008-04-19 15:44:16 Re: Initial ugly reverse-translator