Re: reducing number of ANDs speeds up query

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: "T(dot) E(dot) Lawrence" <t(dot)e(dot)lawrence(at)icloud(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: reducing number of ANDs speeds up query
Date: 2013-01-12 14:34:04
Message-ID: CAF-3MvNS94v9+gPFvW=eF8WO9=QsbwAP3pK3PmPuSg+ism5fpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12 January 2013 12:41, T. E. Lawrence <t(dot)e(dot)lawrence(at)icloud(dot)com> wrote:

> Hi and thank you for your notes!
>
> > You really ought to include the output of EXPLAIN ANALYZE in cases such
> as these (if it doesn't already point you to the culprit).
>
> I'll do so, it takes quite long...
>
> > Most likely you'll find that the last condition added a sequential scan
> to the query plan,
>
> Exactly! EXPLAIN says so.
>
> > which can have several causes/reasons. Are the estimated #rows close to
> the actual #rows?
>
> Yes, this is the problem. I read that in such cases indexes are not read.
> However if the previous conditions are executed first, the result is zero
> or just a few rows and there is no need seq scan the whole values column.
>

You mean they don't match, do you?

The database doesn't know what you know and its making the wrong decision
based on incorrect data.

The database won't use an index if it thinks that there aren't many rows to
check against a condition. Most likely (the results from explain analyze
would tell) the database thinks there are much fewer rows in table b than
there actually are.

You'll probably want to read about database maintenance for Postgres and
how to keep its statistics up to date. Autovacuum may need some tuning or
you need to run manual VACUUM more frequently.
In fact, run VACUUM now and see if the problem goes away.

You'll usually also want to run VACUUM after a large batch job.

> > Meanwhile, it looks like most of your AND's are involved in joining
> tables a and b. Perhaps it helps to use an explicit join instead of an
> implicit one?
>
> I am not quite sure what this means, but will read about it.
>

You're currently using implicit joins by combining your join conditions in
the WHERE clause of your query, like this:
SELECT *
FROM a, b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND b.value = 'yadayada';

You can also explicitly put your join conditions with the joins, like so:
SELECT *
FROM a INNER JOIN b ON (a.col1 = b.col1 AND a.col2 = b.col2)
WHERE b.value = 'yadayada';

You explicitly tell the database that those are the conditions to be joined
on and that the remaining conditions are filters on the result set. With
just two tables the need for such isn't that obvious, but with more tables
it quickly becomes difficult to see what condition in an implicit join is
part of the joins and which is the result set filter. With explicit joins
that's much clearer.
It wouldn't be the first time that I rewrite a query to use explicit joins,
only to find that the original query was incorrect.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2013-01-12 21:02:27 Re: changes "during checkpointing"
Previous Message Tony Theodore 2013-01-12 12:47:19 Re: reducing number of ANDs speeds up query