Re: Planner reluctant to start from subquery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgreSQL(dot)org
Subject: Re: Planner reluctant to start from subquery
Date: 2006-02-02 16:58:24
Message-ID: 16717.1138899504@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes [offlist]:
> Attached is a pg_dump -c file with only the required rows (none of
> which contain confidential data), and 0.1% of the rows from the larger
> tables. It does show the same pattern of costing and plan choice.

Thanks for the test case. The first thing I found out was that HEAD
does generate the fast plan from the IN case, while 8.1 does not, and
after a bit of digging the reason became clear. The initial state
that the planner starts from is essentially

SELECT ... FROM ((C JOIN P) LEFT JOIN WPCT) IN-JOIN D

(IN-JOIN being a notation for the way the planner thinks about IN, which
is that it's a join with some special runtime behavior). The problem
with this is that outer joins don't always commute with other joins,
and up through 8.1 we didn't have any code to analyze whether or not
re-ordering outer joins is safe. So we never did it at all. HEAD does
have such code, and so it is able to re-order the joins enough to
generate the fast plan, which is essentially

SELECT ... FROM ((C IN-JOIN D) JOIN P) LEFT JOIN WPCT

This is why eliminating the OUTER JOIN improved things for you. Your
manual rearrangement into a JOIN-with-GROUP-BY inside the OUTER JOIN
essentially duplicates the IN-JOIN rearrangement that HEAD is able to
do for itself.

BTW, the reason why getting rid of the OR improved matters is that:
(a) with the "WPCT"."profileName" IS NOT NULL part as a top-level WHERE
clause, the planner could prove that it could reduce the OUTER JOIN to
a JOIN (because no null-extended row would pass that qual), whereupon
it had join order flexibility again.
(b) with the "C"."caseType" = 'PA' AND "C"."isConfidential" = false
part as a top-level WHERE clause, there still wasn't any join order
flexibility, but this added restriction on C reduced the number of C
rows enough that there wasn't a performance problem anyway.

So it's all fairly clear now what is happening. The good news is we
have this fixed for 8.2, the bad news is that that patch is much too
large to consider back-patching.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Stange 2006-02-02 19:41:19 Re: Huge Data sets, simple queries
Previous Message Marc Morin 2006-02-02 16:27:38 Re: partitioning and locking problems