Quick Links

Re: PATCH: CITEXT 2.0 v2

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: PATCH: CITEXT 2.0 v2
Date:	2008-07-07 23:24:27
Message-ID:	551B62DD-77F5-4CA3-9C6E-14E38A64EF26@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Jul 7, 2008, at 08:01, Andrew Dunstan wrote:

> What does still bother me is its performance. I'd like to know if
> any measurement has been done of using citext vs. a functional index
> on lower(foo).

Okay, here's a start. The attached script inserts random strings of
1-10 space-delimited words into text and citext columns, and then
compares the performance of queries with and without indexes. The
output for me is as follows:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 254.254 ms
SELECT * FROM try WHERE citext = 'food';
Time: 288.535 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.385 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 236.186 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 235.818 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 1.260 ms
SELECT * FROM try WHERE citext = 'food';
Time: 277.755 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.073 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 238.430 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 238.685 ms
benedict%

So for some reason, after adding the indexes, the queries against the
CITEXT column aren't using them. Furthermore, the `lower(text) LIKE
lower(?)` query isn't using *its* index. Huh?

So this leaves me with two questions:

1. For what reason would the query against the citext column *not* use
the index?

2. Is there some way to get the CITEXT index to behave like a LOWER()
index, that is, so that its value is stored using the result of the
str_tolower() function, thus removing some of the overhead of
converting the values for each row fetched from the index? (Does this
question make any sense?)

Thanks,

David

In response to

Re: PATCH: CITEXT 2.0 v2 at 2008-07-07 15:01:35 from Andrew Dunstan

Responses

Re: PATCH: CITEXT 2.0 v2 at 2008-07-07 23:26:36 from David E. Wheeler
Re: PATCH: CITEXT 2.0 v2 at 2008-07-08 00:34:39 from David E. Wheeler

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David E. Wheeler	2008-07-07 23:26:36	Re: PATCH: CITEXT 2.0 v2
Previous Message	David E. Wheeler	2008-07-07 23:09:18	Re: \SET QUIET and \timing