Quick Links

Re: Levenshtein Distance with more than 255 characters

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Szymon Guz <mabewlun(at)gmail(dot)com>
Cc:	Janek Sendrowski <janek12(at)web(dot)de>, PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Levenshtein Distance with more than 255 characters
Date:	2013-09-06 06:47:34
Message-ID:	13969.1378450054@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Szymon Guz <mabewlun(at)gmail(dot)com> writes:
> On 6 September 2013 01:00, Janek Sendrowski <janek12(at)web(dot)de> wrote:
>> I'm searching for an optimized Levenshtein Distance like Postgresql's. My
>> problem is that I want to compare strings with a length over 255 characters.
>> Does anyone know a solution?

> I'm not sure there is anything different from what you've found in
> core/contribs. But you can always use pg/plpython or pg/plperl procedure
> with some external library calculating the distance.

Well, you could just rebuild the fuzzystrmatch module with a different
value for MAX_LEVENSHTEIN_STRLEN. The comments in the code note that the
comparison cost is roughly O(N^2) in the string length, and the reason for
having a limit at all is to ensure the function runtime doesn't get out of
hand --- but it seems likely to me that 255 is an unnecessarily
conservative limit. If you wanted to do a few tests and report back on
just how slow it can get, we might be persuaded to raise the stock
setting.

regards, tom lane

In response to

Re: Levenshtein Distance with more than 255 characters at 2013-09-06 06:40:33 from Szymon Guz

Responses

Re: Levenshtein Distance with more than 255 characters at 2013-09-06 07:30:36 from Szymon Guz

Browse pgsql-general by date

	From	Date	Subject
Next Message	Albe Laurenz	2013-09-06 07:04:40	Re: How to check if any WAL file is missing in archive folder
Previous Message	Szymon Guz	2013-09-06 06:40:33	Re: Levenshtein Distance with more than 255 characters