From: | Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Complex filters -> Bad row estimates -> bad query plan |
Date: | 2019-08-21 15:53:22 |
Message-ID: | CAHoiPjzcv4T+5a7giM4925Ltj+VegQkmG6+_1PUtGX3-dZ141A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all,
I have a database query where I have a number of "simple" where clauses, a
number of "complex" subquery based where clauses, and one NOT EXISTS where
clause; it looks something like this:
SELECT ...some fields... FROM Table1
WHERE
Field1 IN (1, 2, 3, 4, 21, 24) AND -- simple filter
Field2 <> 1 AND -- simple filter
Field3 >= '2019-07-08' AND -- simple filter
Field3 <= '2019-08-18' AND -- simple filter
NOT EXISTS (SELECT 1 FROM Table2 WHERE Table2.Status = 2 AND
Table2.SomeId = Table1.Id) AND -- anti-join
COALESCE((SELECT Status FROM Table3 WHERE Table3.SomeId = Table1.Id),
(SELECT Status FROM Table4 WHERE Table4.SomeId = Table1.Id)) = 2 --
"complex" condition
The problem I'm encountering is that I've observed degraded performance in
some cases where the Anti Join merge for the NOT EXISTS clause is planned
based upon poor row estimates for Table1. All of the other filters, and
the SubPlan filter(s) for the complex clauses, result in the query planner
estimating that only 1 row from Table1 will be resulting, so a Nested Loop
Anti Join is used and the RHS of that nested loop in a seqscan on Table2.
The reality is that many thousands of records match all the conditions; a
Merge Anti Join or Hash Anti Join would be a better query plan.
I've tested the query planner with just the simpler conditions, and it
makes pretty reasonable estimates about the row count (+/- 10%). Adding
the NOT EXISTS results in a Merge Anti Join, and performance is great.
Adding the more "complex" conditions (there are potentially multiple of
these subquery plan searches) results in the estimated row count dropping
to 1, and, performance dives.
I know there are no "query hints" in PostgreSQL... any thoughts on
alternative approaches here? The only option I've used in the past for
this is creating specialized indexes, which can provide more targeted
statistics; but it's not applicable here since the "complex" conditions use
data from another table in a subquery.
Appreciate any thoughts, theories, or directions. :-) Thanks,
Mathieu
From | Date | Subject | |
---|---|---|---|
Next Message | Rob Sargent | 2019-08-21 17:54:14 | Re: SELECT all the rows where id is children of other node. |
Previous Message | Adrian Klaver | 2019-08-21 15:47:05 | Re: Rename a column if not already renamed.? |