Quick Links

Re: Patch: pg_trgm: gin index scan performance for similarity search

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Fornaroli Christophe <cfornaro(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Patch: pg_trgm: gin index scan performance for similarity search
Date:	2015-12-24 18:06:09
Message-ID:	CAPpHfduvmuQRzmKUWG-i0EgAw=NhDH=3PfDQ6jdnpsxcSx0GvA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Christophe!

On Thu, Dec 24, 2015 at 6:28 PM, Fornaroli Christophe <cfornaro(at)gmail(dot)com>
wrote:

> This code uses this upper bound for the similarity: ntrue / (nkeys -
> ntrue). But if there is ntrue trigrams in common, we know that the indexed
> string is at least ntrue trigrams long. We can then use a more aggressive
> upper bound: ntrue / (ntrue + nkeys - ntrue) or ntrue / nkeys. Attached is
> a patch that changes this.
>

Good catch, thank you! The estimate in pg_trgm was not optimal.
I think it would be good to add comment which would explicitly state why do
we use this upper bound.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Patch: pg_trgm: gin index scan performance for similarity search at 2015-12-24 15:28:14 from Fornaroli Christophe

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2015-12-24 18:42:10	Re: Commit fest status for 2015-11
Previous Message	Chapman Flack	2015-12-24 18:01:33	missing "SPI_finish" that isn't missing