Re: Improving docs for strict_word_similarity()

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-docs(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Improving docs for strict_word_similarity()
Date: 2018-06-01 15:39:11
Message-ID: CAPpHfdumsXfLUhtuiwDWU+Gf-KYkkqHCvMvRggYOugt-FBjfFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Hi, Bruce!

On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> While creating the release notes, I was confused by the description for
> strict_word_similarity(), particularly "extent boundaries". The
> attached patch clarifies, at least for me, how word_similarity() and
> strict_word_similarity() differ.
>

Thank you for your efforts on improving documentation of pg_trgm.
However, I don't find all of them correct. I've following notes regarding
the edits you propose.

--- 112,119 ----
</entry>
<entry><type>real</type></entry>
<entry>
! Same as <function>word_similarity(text, text)</function>, but
! considers the set of trigrams to be of the same length.
</entry>
</row>
<row>

This doesn't look a correct description. In
short, strict_word_similarity() is searching
for extent of words in the second string, which is best match for the first
string.
So, this function takes care about using whole words from the second
strings,
not parts of words. However, this is not matter of length of trigrams sets.

--- 164,182 ----
This function returns a value that can be approximately understood as
the
greatest similarity between the first string and any substring of the
second
string. However, this function does not add padding to the boundaries
of
! the extent. Thus, the number of additional characters present in the
! second string is not considered, except for the mismatched word
boundry.
</para>

This looks correct for me.

! The function <function>strict_word_similarity(text, text)</function>
! does consider additional characters in the second string. In the
! example above, <function>strict_word_similarity(text, text)</function>
! would use the full trigram for the second string when computing
! similarity, not just the part of the trigram that matches the
! first string. For example, it would use the <literal>{" w","
! wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
! word <literal>'words'</literal>.

After your edits, it looks like strict_word_similarity() matches full
set of first string trigrams to full set of second string trigrams.
However,
this is description of just similarity() function. Actually,
strict_word_similarity() matches set of trigrams of first string to
set of trigrams of conjuncted subset of second string words.

--- 189,197 ----

<para>
Thus, the <function>strict_word_similarity(text, text)</function>
function
! is useful for finding the similarity to whole words, while
<function>word_similarity(text, text)</function> is more suitable for
! finding the similarity for parts of words.
</para>

This also looks correct to me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Lætitia Avrot 2018-06-01 15:39:18 Constraint documentation
Previous Message Dmitry Igrishin 2018-06-01 14:56:34 Add Pgfe library to client interfaces