Re: Improving docs for strict_word_similarity()

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-docs(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Improving docs for strict_word_similarity()
Date: 2018-06-12 18:08:39
Message-ID: CAPpHfds38hGF9_Qs3Up4Dx4vuvVWJkqdCyAPYo7nZvo_5eebkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On Fri, Jun 1, 2018 at 6:39 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>
>> While creating the release notes, I was confused by the description for
>> strict_word_similarity(), particularly "extent boundaries". The
>> attached patch clarifies, at least for me, how word_similarity() and
>> strict_word_similarity() differ.
>
>
> Thank you for your efforts on improving documentation of pg_trgm.
> However, I don't find all of them correct. I've following notes regarding
> the edits you propose.
>
> --- 112,119 ----
> </entry>
> <entry><type>real</type></entry>
> <entry>
> ! Same as <function>word_similarity(text, text)</function>, but
> ! considers the set of trigrams to be of the same length.
> </entry>
> </row>
> <row>
>
> This doesn't look a correct description. In short, strict_word_similarity() is searching
> for extent of words in the second string, which is best match for the first string.
> So, this function takes care about using whole words from the second strings,
> not parts of words. However, this is not matter of length of trigrams sets.
>
> --- 164,182 ----
> This function returns a value that can be approximately understood as the
> greatest similarity between the first string and any substring of the second
> string. However, this function does not add padding to the boundaries of
> ! the extent. Thus, the number of additional characters present in the
> ! second string is not considered, except for the mismatched word boundry.
> </para>
>
> This looks correct for me.
>
> ! The function <function>strict_word_similarity(text, text)</function>
> ! does consider additional characters in the second string. In the
> ! example above, <function>strict_word_similarity(text, text)</function>
> ! would use the full trigram for the second string when computing
> ! similarity, not just the part of the trigram that matches the
> ! first string. For example, it would use the <literal>{" w","
> ! wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
> ! word <literal>'words'</literal>.
>
> After your edits, it looks like strict_word_similarity() matches full
> set of first string trigrams to full set of second string trigrams. However,
> this is description of just similarity() function. Actually,
> strict_word_similarity() matches set of trigrams of first string to
> set of trigrams of conjuncted subset of second string words.
>
> --- 189,197 ----
>
> <para>
> Thus, the <function>strict_word_similarity(text, text)</function> function
> ! is useful for finding the similarity to whole words, while
> <function>word_similarity(text, text)</function> is more suitable for
> ! finding the similarity for parts of words.
> </para>
>
> This also looks correct to me.

I've edited places, which looked incorrect for me. I tried to do my
best in making them as clear as possible. Bruce, could you please
take a look on them?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
pg-trgm-doc-2.patch application/octet-stream 2.3 KB

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2018-06-13 14:57:19 Re: Improving docs for strict_word_similarity()
Previous Message Tom Lane 2018-06-12 02:59:28 Re: updatable cursors and ORDER BY