Re: [SQL] Searching Text Fields - Case Sensitive?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Moray McConnachie" <moray(dot)mcconnachie(at)computing-services(dot)oxford(dot)ac(dot)uk>
Cc: pgsql-sql(at)postgreSQL(dot)org
Subject: Re: [SQL] Searching Text Fields - Case Sensitive?
Date: 1999-08-13 14:27:15
Message-ID: 4269.934554435@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

"Moray McConnachie" <moray(dot)mcconnachie(at)computing-services(dot)oxford(dot)ac(dot)uk> writes:
>> SELECT * FROM mydb WHERE lower(description) LIKE "%foobar%";

> Brings up an interesting question. Which is faster, to use the
> lower/upper functions as above (and of course if foobar is a variable,
> it would also have to be made lower case by the program calling
> pgsql), or to use the pattern matching case-insensitive operator?

Offhand I'd guess that the case-insensitive regex would be faster than
the above, because it avoids the overhead of making an extra function
call. (Functions that return text, or any other variable-length data
type, require a memory allocation step --- for typical string lengths
I'd imagine the malloc costs more than the actual character-slinging...)

*However*, the situation changes considerably if you have an index
on lower(description) and a pattern that allows the index to be used.
The above pattern does not, but any left-anchored pattern, say
'foobar%', would allow index restriction clauses to be generated and
used. The effective query with a left-anchored pattern is like this:

... WHERE lower(description) LIKE "foobar%" AND
lower(description) >= "foobar" AND
lower(description) < "foobas";

(actually there are some character-set issues that complicate matters,
but that's the basic idea). When you have an index on
lower(description), the system can and will use the last two clauses
to restrict an indexscan so that tuples outside the range foobar to
foobas are never even fetched. Obviously, this can make for a speedup
of orders of magnitude, swamping any nitpicky little questions of how
fast a particular expression can be evaluated.

If you use a case-insensitive regex then this doesn't work, even for a
left-anchored pattern ('^foobar'), because the system knows that 'f' can
match either 'f' or 'F' so it can't generate the index restriction
clause...

regards, tom lane

Browse pgsql-sql by date

  From Date Subject
Next Message Hutton, Rob 1999-08-13 14:36:56 RE: [SQL] Problems with default date and time
Previous Message Hutton, Rob 1999-08-13 14:25:29 RE: [SQL] Problems with default date and time