Quick Links

Re: Index plan returns different results to sequential scan

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, John Burns <john(at)impactdatametrics(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Index plan returns different results to sequential scan
Date:	2024-03-23 11:24:42
Message-ID:	fdb33101-a974-481c-ada3-0d7ab052c656@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 3/23/24 03:24, Peter Geoghegan wrote:
> On Thu, Mar 21, 2024 at 2:03 PM John Burns <john(at)impactdatametrics(dot)com> wrote:
>> The query is SELECT * FROM XXX where postcode % ’NW10’
>> To create a sample table — create table XXX ( udprn bigint, postcode postcode )
>> To Index it CREATE INDEX on XXX(postcode)
>
> The opfamily's % operator uses the B-Tree equality strategy. This
> means that it works the same way as = works in most other opfamilies.
>
> I don't see how equality can work reliably here. A query with a
> predicate "WHERE my_indexed_postcode_column % ‘NW10’" seems to work by
> masking the value stored in the index, throwing away some amount of
> suffix bytes in the process. But the values from the index are still
> stored in their original order -- the temporarily masked suffix bytes
> aren't masked in the index, of course (they're only masked
> temporarily, by the cross-type equality operator %).
>
> Wouldn't you need something closer to "WHERE
> my_indexed_postcode_column >= ‘NW10’ and my_indexed_postcode_column <
> ‘NW11’" for this to work reliably?
>

Yeah, I was not sure how come this could be processed using equality
operator, but I got distracted by bisecting this to a particular commit.
I didn't realize the commit might have changed how much we rely on the
opfamily to do things correctly.

I do think the '%' operator is pretty close to what we do for LIKE with
prefix patterns for text:

explain select * from t where a like 'aa%';
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using t_a_idx on t (cost=0.29..4.31 rows=1 width=33)
Index Cond: ((a ~>=~ 'aa'::text) AND (a ~<~ 'ab'::text))
Filter: (a ~~ 'aa%'::text)
(3 rows)

So I guess postcode would need to do something similar - treat the '%'
operator as separate from opclass equality, and define a new support
procedure akin to text_support, translating the '%' into the range quals
that can match the index.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Index plan returns different results to sequential scan at 2024-03-23 02:24:52 from Peter Geoghegan

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alexander Korotkov	2024-03-23 11:37:04	Re: [BUG] false positive in bt_index_check in case of short 4B varlena datum
Previous Message	Peter Geoghegan	2024-03-23 02:24:52	Re: Index plan returns different results to sequential scan