From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | bjdev(dot)gthb(at)laposte(dot)net |
Subject: | BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results |
Date: | 2024-10-14 09:14:50 |
Message-ID: | 18654-c09f568d3ba6dfcd@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 18654
Logged by: bjdev
Email address: bjdev(dot)gthb(at)laposte(dot)net
PostgreSQL version: 15.4
Operating system: Ubuntu 22.04.5 LTS
Description:
Hi,
The extension fuzzystrmatch propose an implementation of levenshtein
function.
There is one version with costs parameters
levenshtein(text source, text target, int ins_cost, int del_cost, int
sub_cost) returns int
But if we use this function with parameters other than 1 (the default) the
result is incorrect
SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)
SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
The correct result is 6 (all the letter have to be substitute and it's not
possible to have a lower score with others operations)
Here, it's easy to verify manually but you can check that using python
implementation
from Levenshtein import distance
distance("horses","shorse",weights=(100,10,1))
# => 6
SELECT levenshtein('horses','shorse',1,10,100) => 12 (INCORRECT)
The correct result is 11 (insert "s" first (+1) and remove last "s"(+10)
Here, it's easy to verify manually but you can check that using python
implementation
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,100))
# => 11
SELECT levenshtein('horses','shorse',1,10,1) => 2 (INCORRECT)
The correct result is 6
you can check that using python implementation
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,1))
# => 6
The use of cost parameters of the levenshtein function is therefore not
possible, which is a shame.
Regards
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2024-10-14 09:16:11 | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
Previous Message | Ba Jinsheng | 2024-10-14 06:52:30 | Re: Question of Parallel Hash Join on TPC-H Benchmark |