Re: Poor performance with queries using clause: sth IN (...)

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Andrzej Zawadzki" <zawadaa(at)wp(dot)pl>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Poor performance with queries using clause: sth IN (...)
Date: 2007-05-09 14:29:58
Message-ID: 87wszigjdl.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> AND '' IN ('', upper(b.nazwisko))
> AND '' IN ('', upper(b.imie))
> AND '78111104485' IN ('', b.pesel)
> AND '' IN ('', upper(trim(b.dowseria)))
> AND '' IN ('', b.dowosnr)
> AND 0 IN (0, b.typkred)
> AND k.datazwrot IS NULL;

Hum, interesting. Most of the work Postgres does with IN clauses is on the
assumption that the column you're trying to restrict is on the left hand side
of the IN clause.

1) I think you'll be much better off expanding these into OR clauses.

2) I assume the left hand sides of the IN clauses are actually parameters? I
would recommend using bound parameters mostly for security but also for
performance reasons in that case.

3) having upper() and trim() around the columns makes it basically impossible
for the planner to use indexes even if it was capable of expanding the IN
clauses into OR expressions. Your options are either

a) use an expression index, for example
CREATE INDEX idx_nazwisko on kredytob (upper(nazwisko))

b) use a case-insensitive locale (which you may already be doing) in which
case the upper() is simply unnecessary.

c) use the citext data type (or a case insensitive indexable operator but we
don't seem to have a case insensitive equals, only LIKE and regexp
matches? That seems strange.)

4) You should consider using text or varchar instead of char(). char() has no
performance advantages in Postgres and is annoying to work with.

Something like this with expression indexes on upper(nazwisko), upper(imie),
upper(trim(downseria)) would actually be optimized using indexes:

AND (? = '' OR upper(b.nazwisko) = ?)
AND (? = '' OR upper(b.imie) = ?)
AND (? = '' OR b.pesel = ?)
AND (? = '' OR upper(trim(b.downseria)) = ?)
AND (? = '' OR b.dowosnr = ?)
AND (? = 0 OR b.typkred = ?)
AND k.datazwrot IS NULL

If this is the only query or a particularly important query you could consider
making all those indexes partial with "WHERE datazwrot IS NULL" as well.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2007-05-09 14:34:00 Re: Nested loops overpriced
Previous Message Carlos Moreno 2007-05-09 14:29:56 Re: Throttling PostgreSQL's CPU usage