From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fuzzy substring searching with the pg_trgm extension |
Date: | 2016-01-29 14:15:18 |
Message-ID: | 56AB73F6.7050200@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> The behavior of this function is surprising to me.
>
> select substring_similarity('dog' , 'hotdogpound') ;
>
> substring_similarity
> ----------------------
> 0.25
>
Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ;
substring_similarity
----------------------
0.75
contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ;
substring_similarity
----------------------
1
It seems to me that users search words in long string. But I'm agree that more
detailed explanation needed and, may be, we need to change feature name to
fuzzywordsearch or something else, I can't imagine how.
>
> Also, should we have a function which indicates the position in the
> 2nd string at which the most similar match to the 1st argument occurs?
>
> select substring_similarity_pos('dog' , 'hotdogpound') ;
>
> answering: 4
Interesting, I think, it will be useful in some cases.
>
> We could call them <<-> and <->> , where the first corresponds to <%
> and the second to %>
Agree
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Artur Zakirov | 2016-01-29 14:20:46 | Re: Fuzzy substring searching with the pg_trgm extension |
Previous Message | Petr Jelinek | 2016-01-29 14:11:21 | Re: Sequence Access Method WIP |