Bad Estimate for complex query with JOINS on subselects and OR in where conditions

From: Peter Grman <peter(dot)grman(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Bad Estimate for complex query with JOINS on subselects and OR in where conditions
Date: 2019-08-16 11:39:09
Message-ID: CACF7Wx0twq1scHAuc3Q7itp727nfYZsj6EWYvFf1zYYjG1nJNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

our ORM with tenant separation enabled is creating the following query:

SELECT *
FROM "Reservation"."Reservation" AS q
LEFT JOIN (
SELECT q0.*
FROM "Reservation"."PaymentAccount" AS q0
WHERE q0."AccountCode" = 'A1'
) AS t ON q."PaymentAccountId" = t."Id"
INNER JOIN (
SELECT q1.*
FROM "Reservation"."NoShowFee" AS q1
WHERE q1."AccountCode" = 'A1'
) AS t0 ON q."NoShowFeeId" = t0."Id"
INNER JOIN (
SELECT q2.*
FROM "Policy_Fees"."Configuration" AS q2
WHERE q2."AccountCode" = 'A1'
) AS t1 ON t0."OriginalFeeConfigurationId" = t1."Id"
INNER JOIN (
SELECT q3.*
FROM "Reservation"."CancellationFee" AS q3
WHERE q3."AccountCode" = 'A1'
) AS t2 ON q."CancellationFeeId" = t2."Id"
INNER JOIN (
SELECT q4.*
FROM "Policy"."CancellationPolicy" AS q4
WHERE q4."AccountCode" = 'A1'
) AS t3 ON t2."OriginalCancellationPolicyId" = t3."Id"
INNER JOIN (
SELECT q5.*
FROM "Reservation"."Booking" AS q5
WHERE q5."AccountCode" = 'A1'
) AS t4 ON q."BookingId" = t4."Id"
INNER JOIN (
SELECT q6.*
FROM "Reservation"."Person" AS q6
WHERE q6."AccountCode" = 'A1'
) AS t5 ON t4."BookerId" = t5."Id"
INNER JOIN (
SELECT q7.*
FROM "Reservation"."Person" AS q7
WHERE q7."AccountCode" = 'A1'
) AS t6 ON q."PrimaryGuestId" = t6."Id"
WHERE (q."AccountCode" = 'A1') AND (((((t6."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE) OR ((t5."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) OR ((q."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) OR EXISTS (
SELECT 1
FROM "Reservation"."TimeSlice" AS q8
LEFT JOIN (
SELECT q9.*
FROM "Inventory"."Unit" AS q9
WHERE q9."AccountCode" = 'A1'
) AS t7 ON q8."UnitId" = t7."Id"
WHERE ((q8."AccountCode" = 'A1') AND ((t7."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) AND (q."Id" = q8."ReservationId")))

This is already optimized to really only include the tables that we need (I
left out the projection as we fetch almost every column from those tables).

When I tried to analyze the execution plan in http://tatiyants.com/pev/ I
saw a label "bad estimate" in one path of the execution plan:

[image: Screenshot_20190816_131944.png]

I looked around to see what it might mean, and found
https://www.postgresql.org/message-id/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
but there the issues seemed to have been a long running transaction, which
we don't have (
https://www.postgresql.org/message-id/CAFh58O9ttZpyCM%2BnDb7DMQsi1ca%2BZ1c%3DB50eP2Vf3AQiNZsF9w%40mail.gmail.com
)

The is the execution plan created:

Nested Loop (cost=7.37..6041.66 rows=1 width=2792) (actual
time=1.930..528.372 rows=252 loops=1)
Join Filter: ((q7."SearchVector" @@ to_tsquery('foobar:*'::text)) OR
(q6."SearchVector" @@ to_tsquery('foobar:*'::text)) OR (q."SearchVector" @@
to_tsquery('foobar:*'::text)) OR (alternatives: SubPlan 1 or hashed SubPlan
2))
Rows Removed by Join Filter: 18800
Buffers: shared hit=622893
-> Nested Loop (cost=6.95..5995.95 rows=1 width=2463) (actual
time=0.104..297.051 rows=19052 loops=1)
Buffers: shared hit=383698
-> Nested Loop (cost=6.53..5992.42 rows=1 width=2134) (actual
time=0.091..239.030 rows=19052 loops=1)
Buffers: shared hit=307469
-> Nested Loop Left Join (cost=6.23..5990.48 rows=1
width=1759) (actual time=0.081..192.608 rows=19052 loops=1)
Buffers: shared hit=250274
-> Nested Loop (cost=5.94..5989.22 rows=1 width=1582)
(actual time=0.072..161.734 rows=19052 loops=1)
Buffers: shared hit=223445
-> Nested Loop (cost=5.66..5988.56 rows=2
width=1281) (actual time=0.062..130.338 rows=19052 loops=1)
Buffers: shared hit=166289
-> Nested Loop (cost=5.24..5960.88
rows=15 width=1137) (actual time=0.051..81.762 rows=19052 loops=1)
Buffers: shared hit=90036
-> Nested Loop (cost=4.82..5651.40
rows=110 width=620) (actual time=0.035..23.878 rows=19052 loops=1)
Buffers: shared hit=13824
-> Bitmap Heap Scan on
"CancellationPolicy" q4 (cost=4.40..53.38 rows=16 width=245) (actual
time=0.018..0.052 rows=16 loops=1)
Recheck Cond:
("AccountCode" = 'A1'::text)
Heap Blocks: exact=10
Buffers: shared hit=12
-> Bitmap Index Scan on
"IX_CancellationPolicy_AccountCode_EntityId" (cost=0.00..4.40 rows=16
width=0) (actual time=0.014..0.014 rows=16 loops=1)
Index Cond:
("AccountCode" = 'A1'::text)
Buffers: shared
hit=2
-> Index Scan using
"IX_CancellationFee_OriginalCancellationPolicyId" on "CancellationFee" q3
(cost=0.42..349.69 rows=19 width=375) (actual time=0.008..1.236 rows=1191
loops=16)
Index Cond:
("OriginalCancellationPolicyId" = q4."Id")
Filter: ("AccountCode" =
'A1'::text)
Buffers: shared hit=13812
-> Index Scan using
"IX_Reservation_CancellationFeeId" on "Reservation" q (cost=0.42..2.81
rows=1 width=517) (actual time=0.002..0.002 rows=1 loops=19052)
Index Cond:
("CancellationFeeId" = q3."Id")
Filter: ("AccountCode" =
'A1'::text)
Buffers: shared hit=76212
-> Index Scan using "PK_NoShowFee" on
"NoShowFee" q1 (cost=0.42..1.85 rows=1 width=144) (actual
time=0.002..0.002 rows=1 loops=19052)
Index Cond: ("Id" = q."NoShowFeeId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=76253
-> Index Scan using "PK_Configuration" on
"Configuration" q2 (cost=0.28..0.33 rows=1 width=301) (actual
time=0.001..0.001 rows=1 loops=19052)
Index Cond: ("Id" =
q1."OriginalFeeConfigurationId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=57156
-> Index Scan using "PK_PaymentAccount" on
"PaymentAccount" q0 (cost=0.29..1.26 rows=1 width=177) (actual
time=0.001..0.001 rows=0 loops=19052)
Index Cond: (q."PaymentAccountId" = "Id")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=26829
-> Index Scan using "PK_Booking" on "Booking" q5
(cost=0.29..1.94 rows=1 width=375) (actual time=0.002..0.002 rows=1
loops=19052)
Index Cond: ("Id" = q."BookingId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=57195
-> Index Scan using "PK_Person" on "Person" q6 (cost=0.42..3.53
rows=1 width=329) (actual time=0.002..0.002 rows=1 loops=19052)
Index Cond: ("Id" = q5."BookerId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=76229
-> Index Scan using "PK_Person" on "Person" q7 (cost=0.42..3.33 rows=1
width=329) (actual time=0.002..0.002 rows=1 loops=19052)
Index Cond: ("Id" = q."PrimaryGuestId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=76340
SubPlan 1
-> Nested Loop (cost=0.71..41.61 rows=1 width=0) (actual
time=0.007..0.007 rows=0 loops=18800)
Buffers: shared hit=162855
-> Index Scan using
"IX_TimeSlice_ReservationId_RatePlanId_UnitGroupId_FromUtc_T~" on
"TimeSlice" q8 (cost=0.42..24.78 rows=1 width=8) (actual time=0.003..0.003
rows=2 loops=18800)
Index Cond: (q."Id" = "ReservationId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=81574
-> Index Scan using "PK_Unit" on "Unit" q9 (cost=0.29..8.56
rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=33918)
Index Cond: ("Id" = q8."UnitId")
Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector"
@@ to_tsquery('foobar:*'::text)))
Rows Removed by Filter: 1
Buffers: shared hit=81281
SubPlan 2
-> Nested Loop (cost=32.69..187.36 rows=1 width=8) (never executed)
-> Bitmap Heap Scan on "Unit" q9_1 (cost=32.26..40.63 rows=1
width=8) (never executed)
Recheck Cond: ("SearchVector" @@
to_tsquery('foobar:*'::text))
Filter: ("AccountCode" = 'A1'::text)
-> Bitmap Index Scan on "IX_Unit_SearchVector"
(cost=0.00..32.26 rows=2 width=0) (never executed)
Index Cond: ("SearchVector" @@
to_tsquery('foobar:*'::text))
-> Index Scan using "IX_TimeSlice_UnitId" on "TimeSlice" q8_1
(cost=0.42..146.70 rows=4 width=16) (never executed)
Index Cond: ("UnitId" = q9_1."Id")
Filter: ("AccountCode" = 'A1'::text)
Planning Time: 19.838 ms
Execution Time: 528.758 ms

I'm aware that those OR conditions make it worse than it has to be, using
AND instead for the where condition like this:

WHERE (q."AccountCode" = 'A1') AND (((((t6."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE) AND ((t5."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) AND ((q."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) AND EXISTS (
SELECT 1
FROM "Reservation"."TimeSlice" AS q8
LEFT JOIN (
SELECT q9.*
FROM "Inventory"."Unit" AS q9
WHERE q9."AccountCode" = 'A1'
) AS t7 ON q8."UnitId" = t7."Id"
WHERE ((q8."AccountCode" = 'A1') AND ((t7."SearchVector" @@
to_tsquery(('foobar' || ':*'))) = TRUE)) AND (q."Id" = q8."ReservationId")))

I get a seemingly similar plan, with all those nested loops, but without
the "bad estimate" label and it's supposed to be much faster in execution:

Nested Loop (cost=263.89..395.88 rows=1 width=2792) (actual
time=0.835..0.835 rows=0 loops=1)
Buffers: shared hit=137
-> Nested Loop Semi Join (cost=263.47..387.09 rows=1 width=2463)
(actual time=0.834..0.834 rows=0 loops=1)
Buffers: shared hit=137
-> Nested Loop (cost=262.75..360.82 rows=1 width=2463) (actual
time=0.834..0.834 rows=0 loops=1)
Buffers: shared hit=137
-> Nested Loop (cost=262.33..357.00 rows=1 width=2134)
(actual time=0.763..0.808 rows=3 loops=1)
Buffers: shared hit=125
-> Nested Loop (cost=262.05..356.66 rows=1
width=1889) (actual time=0.751..0.791 rows=3 loops=1)
Buffers: shared hit=116
-> Nested Loop (cost=261.63..348.22 rows=1
width=1514) (actual time=0.739..0.771 rows=3 loops=1)
Buffers: shared hit=104
-> Nested Loop (cost=261.35..347.89
rows=1 width=1213) (actual time=0.721..0.749 rows=3 loops=1)
Buffers: shared hit=95
-> Nested Loop Left Join
(cost=260.93..339.45 rows=1 width=1069) (actual time=0.709..0.729 rows=3
loops=1)
Buffers: shared hit=83
-> Nested Loop
(cost=260.64..331.13 rows=1 width=892) (actual time=0.702..0.717 rows=3
loops=1)
Buffers: shared hit=80
-> Bitmap Heap Scan on
"Reservation" q (cost=260.35..314.51 rows=2 width=517) (actual
time=0.689..0.695 rows=3 loops=1)
Recheck Cond:
("SearchVector" @@ to_tsquery('foobar:*'::text))
Filter:
("AccountCode" = 'A1'::text)
Rows Removed by
Filter: 6
Heap Blocks: exact=8
Buffers: shared
hit=71
-> Bitmap Index
Scan on "IX_Reservation_SearchVector" (cost=0.00..260.35 rows=13 width=0)
(actual time=0.677..0.677 rows=13 loops=1)
Index Cond:
("SearchVector" @@ to_tsquery('foobar:*'::text))
Buffers:
shared hit=63
-> Index Scan using
"PK_Booking" on "Booking" q5 (cost=0.29..8.31 rows=1 width=375) (actual
time=0.006..0.006 rows=1 loops=3)
Index Cond: ("Id" =
q."BookingId")
Filter:
("AccountCode" = 'A1'::text)
Buffers: shared
hit=9
-> Index Scan using
"PK_PaymentAccount" on "PaymentAccount" q0 (cost=0.29..8.31 rows=1
width=177) (actual time=0.003..0.003 rows=0 loops=3)
Index Cond:
(q."PaymentAccountId" = "Id")
Filter: ("AccountCode" =
'A1'::text)
Buffers: shared hit=3
-> Index Scan using "PK_NoShowFee"
on "NoShowFee" q1 (cost=0.42..8.44 rows=1 width=144) (actual
time=0.005..0.005 rows=1 loops=3)
Index Cond: ("Id" =
q."NoShowFeeId")
Filter: ("AccountCode" =
'A1'::text)
Buffers: shared hit=12
-> Index Scan using "PK_Configuration" on
"Configuration" q2 (cost=0.28..0.33 rows=1 width=301) (actual
time=0.006..0.006 rows=1 loops=3)
Index Cond: ("Id" =
q1."OriginalFeeConfigurationId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=9
-> Index Scan using "PK_CancellationFee" on
"CancellationFee" q3 (cost=0.42..8.44 rows=1 width=375) (actual
time=0.005..0.005 rows=1 loops=3)
Index Cond: ("Id" = q."CancellationFeeId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=12
-> Index Scan using "PK_CancellationPolicy" on
"CancellationPolicy" q4 (cost=0.28..0.33 rows=1 width=245) (actual
time=0.004..0.004 rows=1 loops=3)
Index Cond: ("Id" =
q3."OriginalCancellationPolicyId")
Filter: ("AccountCode" = 'A1'::text)
Buffers: shared hit=9
-> Index Scan using "PK_Person" on "Person" q6
(cost=0.42..3.78 rows=1 width=329) (actual time=0.008..0.008 rows=0
loops=3)
Index Cond: ("Id" = q5."BookerId")
Filter: (("AccountCode" = 'A1'::text) AND
("SearchVector" @@ to_tsquery('foobar:*'::text)))
Rows Removed by Filter: 1
Buffers: shared hit=12
-> Nested Loop (cost=0.71..26.26 rows=1 width=8) (never executed)
-> Index Scan using
"IX_TimeSlice_ReservationId_RatePlanId_UnitGroupId_FromUtc_T~" on
"TimeSlice" q8 (cost=0.42..24.78 rows=1 width=16) (never executed)
Index Cond: ("ReservationId" = q."Id")
Filter: ("AccountCode" = 'A1'::text)
-> Index Scan using "PK_Unit" on "Unit" q9 (cost=0.29..0.89
rows=1 width=8) (never executed)
Index Cond: ("Id" = q8."UnitId")
Filter: (("AccountCode" = 'A1'::text) AND
("SearchVector" @@ to_tsquery('foobar:*'::text)))
-> Index Scan using "PK_Person" on "Person" q7 (cost=0.42..8.69 rows=1
width=329) (never executed)
Index Cond: ("Id" = q."PrimaryGuestId")
Filter: (("AccountCode" = 'A1'::text) AND ("SearchVector" @@
to_tsquery('foobar:*'::text)))
Planning Time: 31.910 ms
Execution Time: 1.184 ms

We're working on being able to have only one "SearchVector" column, but
until then, I'd like to know how I could help postgres estimating the
execution better.

We're using Postgres 11 on RDS with the default settings

Thank you very much!
Peter

Responses

Browse pgsql-general by date

  From Date Subject
Next Message stan 2019-08-16 11:39:39 A 3 table join question
Previous Message Charles Clavadetscher 2019-08-16 04:41:43 Re: Variable constants ?