Re: Index-only scan for btree_gist turns bpchar to char

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Japin Li <japinli(at)hotmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Index-only scan for btree_gist turns bpchar to char
Date: 2022-01-06 19:21:15
Message-ID: 1162796.1641496875@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Japin Li <japinli(at)hotmail(dot)com> writes:
> On Thu, 06 Jan 2022 at 00:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The minimum-effort fix would be to apply rtrim1 to both strings
>> in gbt_bpchar_consistent, but I wonder if we can improve on that
>> by pushing the ignore-trailing-spaces behavior further down.
>> I didn't look yet at whether gbt_var_consistent can support
>> any type-specific behavior.

> Adding type-specific for gbt_var_consistent looks like more generally.
> For example, for bpchar type, we should call bpchareq rather than texteq.

I looked at this and it does seem like it might work, as per attached
patch. The one thing that is troubling me is that the opclass is set
up to apply gbt_text_same, which is formally the Wrong Thing for bpchar,
because the equality semantics shouldn't be quite the same. But we
could not fix that without a module version bump, which is annoying.
I think that it might not be necessary to change it, because

(1) There's no such thing as unique GIST indexes, so it should not
matter if the "same" function is a bit stricter than the datatype's
nominal notion of equality. It's certainly okay for that to vary
from the semantics applied by the consistent function --- GIST has
no idea that the consistent function is allegedly testing equality.

(2) If all the input values for a column have been coerced to the same
typmod, then it doesn't matter because two values that are equal after
space-stripping would be equal without space-stripping, too.

However, (2) doesn't hold for an existing index that the user has failed
to REINDEX, because then the index would contain some space-stripped
values that same() will not say are equal to incoming new values.
Again, I think this doesn't matter much, but maybe I'm missing
something. I've not really dug into what GIST uses the same()
function for.

In any case, if we do need same() to implement the identical
behavior to bpchareq(), then the other solution isn't sufficient
either.

So in short, it seems like we ought to do some compatibility testing
and see if this code misbehaves at all with an index created by the
old code. I don't particularly want to do that ... any volunteers?

regards, tom lane

Attachment Content-Type Size
fix-btree_gist-bpchar-trimming.patch text/x-diff 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-01-06 19:22:33 Re: Collecting statistics about contents of JSONB columns
Previous Message Tomas Vondra 2022-01-06 19:12:42 Re: Collecting statistics about contents of JSONB columns