Re: PATCH: CITEXT 2.0 v2

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Subject: Re: PATCH: CITEXT 2.0 v2
Date: 2008-07-08 00:34:39
Message-ID: 910F6761-133F-4F67-9CA2-1E8771E638EF@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks to help from RhodiumToad on IRC, I got some things improved here:

On Jul 7, 2008, at 16:24, David E. Wheeler wrote:

> So for some reason, after adding the indexes, the queries against
> the CITEXT column aren't using them. Furthermore, the `lower(text)
> LIKE lower(?)` query isn't using *its* index. Huh?

I never knew what one needed to use the text_pattern_ops operator
class to index a column for use with LIKE! I had no clue. Would that
work for a citext column, too, since it's essentially the same as TEXT?

> So this leaves me with two questions:
>
> 1. For what reason would the query against the citext column *not*
> use the index?

It turns out that it did use the index if I put `SET enable_seqscan =
false;` into my script. So with RhodiumToad's direction, I added some
`RESTRICT` and `JOIN` clauses to my comparison operators (copying them
from ip4r). So now I have:

CREATE OPERATOR = (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
COMMUTATOR = =,
NEGATOR = <>,
PROCEDURE = citext_eq,
RESTRICT = eqsel,
JOIN = eqjoinsel,
HASHES,
MERGES
);

CREATE OPERATOR <> (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
NEGATOR = =,
COMMUTATOR = <>,
PROCEDURE = citext_ne,
RESTRICT = neqsel,
JOIN = neqjoinsel
);

CREATE OPERATOR < (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
NEGATOR = >=,
COMMUTATOR = >,
PROCEDURE = citext_lt,
RESTRICT = scalarltsel,
JOIN = scalarltjoinsel
);

CREATE OPERATOR <= (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
NEGATOR = >,
COMMUTATOR = <=,
PROCEDURE = citext_le,
RESTRICT = scalarltsel,
JOIN = scalarltjoinsel
);

CREATE OPERATOR >= (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
NEGATOR = <,
COMMUTATOR = <=,
PROCEDURE = citext_ge,
RESTRICT = scalargtsel,
JOIN = scalargtjoinsel
);

CREATE OPERATOR > (
LEFTARG = CITEXT,
RIGHTARG = CITEXT,
NEGATOR = <=,
COMMUTATOR = <,
PROCEDURE = citext_gt,
RESTRICT = scalargtsel,
JOIN = scalargtjoinsel
);

With this change, the index was used:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 261.295 ms
SELECT * FROM try WHERE citext = 'food';
Time: 289.304 ms
Time: 1228.961 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 2.018 ms
SELECT * FROM try WHERE citext = 'food';
Time: 0.788 ms

Seems to be faster than the LOWER() version, too, which makes me
happy. The output from EXPLAIN ANALYZE:

try=# EXPLAIN ANALYZE SELECT * FROM try WHERE citext = 'food';

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Index Scan using idx_try_citext on try (cost=0.00..8.31 rows=1
width=119) (actual time=0.324..0.324 rows=0 loops=1)
Index Cond: (citext = 'food'::citext)
Total runtime: 0.377 ms
(3 rows)

try=# EXPLAIN ANALYZE SELECT * FROM try WHERE LOWER(text) =
LOWER('food');

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on try (cost=28.17..1336.10 rows=500 width=119)
(actual time=0.170..0.170 rows=0 loops=1)
Recheck Cond: (lower(text) = 'food'::text)
-> Bitmap Index Scan on idx_try_text (cost=0.00..28.04 rows=500
width=0) (actual time=0.168..0.168 rows=0 loops=1)
Index Cond: (lower(text) = 'food'::text)
Total runtime: 0.211 ms
(5 rows)

So my only other question related to this is:

* Are the above RESTRICT and JOIN functions the ones to use, or is
there some way to make use of those used by the TEXT type that would
be more appropriate?

> 2. Is there some way to get the CITEXT index to behave like a
> LOWER() index, that is, so that its value is stored using the result
> of the str_tolower() function, thus removing some of the overhead of
> converting the values for each row fetched from the index? (Does
> this question make any sense?)

Given the performance with an index, I think that this is moot, yes?
There is, of course, much more overhead for a table scan.

Best,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message imad 2008-07-08 00:38:05 Re: Data type OID numbers fixed?
Previous Message David E. Wheeler 2008-07-08 00:26:24 Re: PATCH: CITEXT 2.0