tsquery @> operator bugs

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: tsquery @> operator bugs
Date: 2014-10-25 20:11:20
Message-ID: 544C03E8.4020402@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

While looking at all the places where we currently use CRC, I bumped
into this:

postgres=# select 'penomaha'::tsquery @> 'lbgimpca'::tsquery;
?column?
----------
t
(1 row)

The @> operator is supposed to return true if the first query contains
all the terms of the second query. The above result is bogus; the
strings are completely different. It returns true because both terms
have the same CRC (with our funky CRC algorithm), and the tsq_mcontains
function only compares the CRCs, not the actual values.

Another bug is that the function performs a length check first, and
returns false if the second string is larger than the first. The
thinking goes that the first string cannot possibly contain the second
string if the second string is larger. But that doesn't take into
account that there can be duplicate strings (this is basically the same
bug that was recently fixed in jsonb):

postgres=# select 'a & b' @> 'a & a'::tsquery; /* CORRECT */
?column?
----------
t
(1 row)

postgres-# select 'a' @> 'a & a'::tsquery; /* WRONG */
?column?
----------
f
(1 row)

I propose the attached fix. It completely rewrites the tsq_mcontains
function, so that it first extracts all the strings from both tsqueries,
then sorts them and removes duplicates, and then compares the arrays.

(I actually find the whole operator pretty useless. What is it good for?
But that's a different story..)

- Heikki

Attachment Content-Type Size
fix-tsquery-contains-op-1.patch text/x-diff 3.3 KB

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-10-26 01:28:27 Re: BUG #11617: issue with dump/restore involving view with hstore data type embedded in where condition
Previous Message Tom Lane 2014-10-25 15:57:25 Re: Re[2]: [BUGS] BUG #11761: range_in dosn't work via direct functional call