Quick Links

Re: btree_gin and BETWEEN

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: btree_gin and BETWEEN
Date:	2015-06-23 19:20:14
Message-ID:	6776.1435087214@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> If I use the btree_gin extension to build a gin index on a scalar value, it
> doesn't work well with BETWEEN queries. It looks like it scans the whole
> index, with the part of the index between the endpoints getting scanned
> twice. It is basically executed as if "col1 between x and y" were "col1
> between -inf and y and col1 between x and +inf".

> It puts the correct tuples into the bitmap, because whichever inequality is
> not being used to set the query endpoint currently is used as a filter
> instead.

> So I could just not build that index. But I want it for other reasons, and
> the problem is that the planner thinks the index can implement the BETWEEN
> query efficiently. So even if it has truly better options available, it
> switches to using a falsely attractive btree_gin index.

> create table foo as select random() as btree, random() as gin from
> generate_series(1,3000000);
> create index on foo using gin (gin);
> create index on foo using btree (btree);
> explain ( analyze, buffers) select count(*) from foo where btree between
> 0.001 and 0.00105;
> explain ( analyze, buffers) select count(*) from foo where gin between
> 0.001 and 0.00105;

> It would be nice if btree_gin supported BETWEEN and other range queries
> efficiently, or at least if the planner knew it couldn't support them
> efficiently. But I don't see where to begin on either one of these tasks.
> Is either one of them plausible?

ISTM this is a bug/oversight in the core GIN code: it has multiple
GinScanEntry's for the same index column, but fails to realize that it
could start the index scans at the largest match value of any of those
GinScanEntry's. This did not matter so much before the partial-match
feature went in; but with a partial match we might be talking about
scanning a large fraction of the index, and it's important to optimize
the start point if possible. I think there is also a failure to notice
when we could stop the scans because one of a group of AND'ed entries
says no more matches are possible.

I'm not sure where the best place to hack in this consideration would
be, though. Oleg, Teodor, any thoughts?

regards, tom lane

In response to

btree_gin and BETWEEN at 2015-06-23 16:29:46 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim Nasby	2015-06-23 19:23:33	Re: less log level for success dynamic background workers for 9.5
Previous Message	Robert Haas	2015-06-23 18:59:06	Re: Should we back-patch SSL renegotiation fixes?