Re: Query plan different depending on the value of where filter

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Looby, Denis" <denis(dot)looby(at)hp(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query plan different depending on the value of where filter
Date: 2013-07-17 17:04:09
Message-ID: 6847.1374080649@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Looby, Denis" <denis(dot)looby(at)hp(dot)com> writes:
> What I don't understand is why the plan is different just because the group_id = has changed value?
> Does the planner have some statistical info on the contents of non-indexed rows?

Of course. In this case it knows that a nestloop would be a loser
because many rows would be fetched from the outer table.

> I don't quite understand why this plan executes the sequential scan once, whereas the slow one does it 5001 times, which I believe is the main source of the difference.

In the hash join case, it builds the hash table, discovers that it's
empty (because there are no rows with type = 2), and concludes that it
need not scan the outer relation. This is a corner case that the
planner doesn't try to account for because it couldn't be predicted
reliably. I wouldn't get too excited about it if I were you, because
presumably the case isn't going to happen all that much in production
either (else why are you bothering with the query?).

Personally I'd suggest getting rid of the
WHERE outer_tab.outer_key = inner_tab_1.key_to
clause, which is entirely redundant with the IN condition and is
preventing the planner from seeing that it could optimize the IN
into a semijoin.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message rox 2013-07-17 20:05:05 What are my options to avoid a Row Exclusive/RowShareLock conflict / is there a better way?
Previous Message Looby, Denis 2013-07-17 16:08:51 Query plan different depending on the value of where filter