Quick Links

Re: similarity and operator '%'

From:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	Volker Boehm <volker(at)vboehm(dot)de>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: similarity and operator '%'
Date:	2016-05-30 18:20:33
Message-ID:	CAKFQuwaEUYS75qJVGQZJ7FGDZtM+kMrzTQzdPNLFhxVk+vQkDg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, May 30, 2016 at 1:53 PM, Volker Boehm <volker(at)vboehm(dot)de> wrote:

>
> The reason for using the similarity function in place of the '%'-operator
> is that I want to use different similarity values in one query:
>
> select name, street, zip, city
> from addresses
> where name % $1
> and street % $2
> and (zip % $3 or city % $4)
> or similarity(name, $1) > 0.8
>
> which means: take all addresses where name, street, zip and city have
> little similarity _plus_ all addresses where the name matches very good.
>
>
> The only way I found, was to create a temporary table from the first
> query, change the similarity value with set_limit() and then select the
> second query UNION the temporary table.
>
> Is there a more elegant and straight forward way to achieve this result?
>

Not that I can envision.

You are forced into using an operator due to our index implementation.

You are thus forced into using a GUC to control the parameter that the
index scanning function uses to compute true/false.

A GUC can only take on a single value within a given query - well, not
quite true[1] but the exception doesn't seem like it will help here.

Th
us you are consigned to

using two queries.

*A functional index doesn't work since the second argument is query
specific

[1] When defining a function you can attach a "SET" clause to it; commonly
used for search_path but should work with any GUC. If you could wrap the
operator comparison into a custom function you could use this capability.
It also would require a function that would take the threshold as a value -
the extension only provides variations that use the GUC.

I don't think this will use the index even if it compiles (not tested):

CREATE FUNCTION similarity_80(col, val)
RETURNS boolean
SET similarity_threshold = 0.80
LANGUAGE sql
AS $$
SELECT col % val;
$$;

David J.

In response to

similarity and operator '%' at 2016-05-30 17:53:59 from Volker Boehm

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff Janes	2016-05-30 19:34:35	Re: Re: Planner chooses slow index heap scan despite accurate row estimates
Previous Message	Volker Boehm	2016-05-30 17:53:59	similarity and operator '%'