Re: [pg_trgm] Making similarity(?, ?) < ? use an index

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Greg Navis <contact(at)gregnavis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [pg_trgm] Making similarity(?, ?) < ? use an index
Date: 2016-06-11 20:40:00
Message-ID: CAKNkYnwjAZiyi1_Ef_qi5LdVJpe7KZnTbv4chtaRkP2s9GruJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2016-06-11 13:47 GMT+03:00 Greg Navis <contact(at)gregnavis(dot)com>:

> I made some progress but I'm stuck. I'm focused on GiST for now. Please
> ignore sloppy naming for now.
>
> I made the following changes to pg_trgm--1.2.sql:
>
> CREATE TYPE pg_trgm_match AS (match TEXT, threshold REAL);
>
> CREATE OR REPLACE FUNCTION trgm_check_match(string TEXT, match
> pg_trgm_match) RETURNS bool AS $$
> BEGIN
> RETURN match.match <-> string <= 1 - match.threshold;
> END;
> $$ LANGUAGE plpgsql;
>
> CREATE OPERATOR %%(leftarg = text, rightarg = pg_trgm_match,
> procedure=trgm_check_match);
>
> ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
> OPERATOR 9 %% (text,
> pg_trgm_match);
>

You can overload existing % operator:

ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
OPERATOR 9 % (text, pg_trgm_match);

>
> It does indeed make PostgreSQL complain about undefined strategy 9. I
> added the following define to trgm.h:
>
> #define ThresholdStrategyNumber 9
>
> It seems StrategyNumber is used in gtrgm_consistent and gtrgm_distance.
>
> In gtrgm_consistent, I need change the way `nlimit` is obtained:
>
> nlimit = (strategy == SimilarityStrategyNumber) ?
> similarity_threshold : word_similarity_threshold;
>
> I need to add a case for ThresholdStrategyNumber and extract `nlimit` from
> the argument of `pg_trgm_match`. I'm not sure what to do in
> `gtrgm_distance`.
>
> My questions:
>
> 1a. Is it possible to make `gtrgm_consistent` accept `text` or
> `pg_trgm_match` as the second argument?
>

I think you can change definition of the gtrgm_consistent() in .sql file in
CREATE FUNCTION and CREATE OPERATOR CLASS commands to:

gtrgm_consistent(internal,anynonarray,smallint,oid,internal)

But I do not sure that anynonarray is good here.

> 1b. What's the equivalent of `match.match` and `match.threshold` (where
> `match` is a `pg_trgm_match`) in C?
>

After changing the definition you can extract values from composite type in
the gtrgm_consistent(). I think the code in the beginning of function may
looks like this:

if (strategy == SimilarityStrategyNumber ||
strategy == WordSimilarityStrategyNumber)
{
query = PG_GETARG_TEXT_P(1);
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
}
else if (strategy == ThresholdStrategyNumber)
{
HeapTupleHeader query_match = PG_GETARG_HEAPTUPLEHEADER(1);
Oid tupType = HeapTupleHeaderGetTypeId(query_match);
int32 tupTypmod = HeapTupleHeaderGetTypMod(query_match);
TupleDesc tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
HeapTupleData tuple;
bool isnull;

tuple.t_len = HeapTupleHeaderGetDatumLength(query_match);
ItemPointerSetInvalid(&(tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = query_match;

query = DatumGetTextP(fastgetattr(&tuple, 1, tupdesc, &isnull));
nlimit = DatumGetFloat4(fastgetattr(&tuple, 2, tupdesc, &isnull));

ReleaseTupleDesc(tupdesc);
}
else
query = PG_GETARG_TEXT_P(1);

After this code you should execute the query using index:

select t,similarity(t,'qwertyu0988') as sml from test_trgm where t %
row('qwertyu0988', 0.6)::pg_trgm_match;

I got the query from the regression test. And of course the code need to be
checked for bugs.

> 2. What to do with `gtrgm_distance`?
>

You do not need to change gtrgm_distance(). It is used only in ORDER BY
clause to calculate distances. To calculate distance you do not need
threshold.

>
> Thanks for help.
> --
> Greg Navis
> I help tech companies to scale Heroku-hosted Rails apps.
> Free, biweekly scalability newsletter for SaaS CEOs
> <http://www.gregnavis.com/newsletter/>
>
>

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christian Ohler 2016-06-12 02:03:55 Sequences, txids, and serial order of transactions
Previous Message Adrian Klaver 2016-06-11 14:12:45 Re: pgAdmin 4 beta not working on Windows 10