fast case-insensitive sort

From: "Sigi Jekabsons" <sigi(dot)j(at)workskillsprofessionals(dot)com(dot)au>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: fast case-insensitive sort
Date: 2003-04-15 00:49:23
Message-ID: 002e01c302e8$db771ab0$7c01a8c0@microvac
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I'm having trouble getting postgres to use an index when doing an ORDER BY
UPPER(surname), for instance. I can create an index on UPPER(surname), but
it doesn't use it in the query - is there a better way of doing a fast case
insensitive sort?

This is the query plan of ORDER BY surname, which uses the index I've
created on surname:

asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY surname;
QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------------------
Index Scan using cands_surname_idx on cands (cost=0.00..3983.55 rows=31348
width=1971) (actual time=0.27..647.09 rows=31348 loops=1)
Total runtime: 664.62 msec
(2 rows)

Note the time taken.
This is the query plan of sorting by UPPER(surname) before and after
creating the index:

asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------
Sort (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3384.08..3444.77 rows=31348 loops=1)
Sort Key: upper((surname)::text)
-> Seq Scan on cands (cost=0.00..1165.48 rows=31348 width=1971) (actual
time=0.06..628.09 rows=31348 loops=1)
Total runtime: 3875.90 msec
(4 rows)

asp_employ=# CREATE INDEX cand_sur_up_idx ON cands (UPPER(surname));
CREATE INDEX
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------
Sort (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3368.43..3429.51 rows=31348 loops=1)
Sort Key: upper((surname)::text)
-> Seq Scan on cands (cost=0.00..1165.48 rows=31348 width=1971) (actual
time=0.07..621.78 rows=31348 loops=1)
Total runtime: 3859.33 msec
(4 rows)

The Seq Scan is much slower. It may be interesting to note that this uses
the index:

asp_employ=# explain analyse select * from cands where
upper(surname)='SMITH';
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------
Index Scan using cand_sur_up_idx on cands (cost=0.00..578.66 rows=157
width=1972) (actual time=54.52..60.06 rows=224 loops=1)
Index Cond: (upper((surname)::text) = 'SMITH'::text)
Total runtime: 60.36 msec
(3 rows)

I created a thread on this topic on the Ars Technica forums, but there was
not a solution - the discussion for reference only is here:
http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=6330927813&
m=1070953065

So should I be using a specific type of index, or is there a better way?

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2003-04-15 02:21:45 Re: fast case-insensitive sort
Previous Message Tom Lane 2003-04-14 23:34:45 Re: help with query!!!