Quick Links

pg_trgm word_similarity inconsistencies or bug

From:	Cristiano Coelho <cristianocca(at)hotmail(dot)com>
To:	"pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject:	pg_trgm word_similarity inconsistencies or bug
Date:	2017-10-27 18:48:08
Message-ID:	CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words

But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.

Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.

with data(t) as (

values

('message'),

('message s'),

('message sag'),

('message sag sag'),

('message sag sage')

)

select t, word_similarity('sage', t), my_word_similarity('sage', t)

from data;

t | word_similarity | my_word_similarity

------------------+-----------------+--------------------

message | 0.6 | 0.3

message s | 0.8 | 0.3

message sag | 1 | 0.5

message sag sag | 1 | 0.5

message sag sage | 1 | 1

Responses

Re: pg_trgm word_similarity inconsistencies or bug at 2017-10-28 08:22:29 from Arthur Zakirov

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Jordan Lewis	2017-10-27 21:05:26	Re: ORDER BY $1 behaves inconsistently
Previous Message	Tom Lane	2017-10-27 18:33:12	Re: ORDER BY $1 behaves inconsistently

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2017-10-27 18:54:32	Re: Index only scan for cube and seg
Previous Message	Tom Lane	2017-10-27 18:15:30	ALTER COLUMN TYPE vs. domain constraints