Is it possible to specify minimum number of rows planner should consider?

From: Timothy Garnett <tgarnett(at)panjiva(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Is it possible to specify minimum number of rows planner should consider?
Date: 2020-09-28 20:12:37
Message-ID: CAPcyiQ2Nbw+caY0y+Gywf+58dmVkpuqPSqJuDLcoL90DbnwA7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Is there some way to tell the planner that unless it's guaranteed by a
constraint or some such it shouldn't guess that the selectivity of a
filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's
guaranteed to be 1 row) or somehow otherwise make it more conservative
around the worst case possibilities. I feel like this would cover something
like 1/3 of the more problematic planner performance issues I run into. The
kind where a query suddenly runs 60,000 times slower than it did
previously. I can live with some queries being slightly slower if I can
avoid the case where they will all of sudden never complete.

My current motivating example is this (... abridged) query:

postgresql 11.7 on ubuntu linux

-> **Nested Loop Left Join** (cost=3484616.45..5873755.65 rows=1
width=295)
Join Filter: (hsc.c_field = c.id)
...
-> *Nested Loop Left Join (cost=1072849.19..3286800.99 rows=1
width=190)*
-> Hash Anti Join (cost=1072848.62..3286798.53
***rows=1***[actually 65k] width=189)
Hash Cond: (c.id = trc.field)
-> Seq Scan on c (cost=0.00..1096064.73
rows=14328573 width=189)
-> Hash (cost=830118.31..830118.31 rows=14794985
width=4)
-> Seq Scan on trc (cost=0.00..830118.31
rows=14794985 width=4)
-> Index Scan using con_pkey on con (cost=0.56..2.46
rows=1 width=9)
Index Cond: (c.con_field = id)
...
-> Unique (cost=2411766.83..2479065.82 rows=4794957 width=29)
-> Sort (cost=2411766.83..2445416.33 rows=13459797 width=29)
Sort Key: hsc.c_field, xxx
-> Hash Join (cost=11143.57..599455.83 rows=13459797
width=29)
...

*** is where the planner is off in it's row estimation
c.id is unique for that table, statistics set to 10k and freshly analyzed
trc.field is unique for that table, statistics set to 10k and freshly
analyzed
row estimates for those tables are pretty close to correct (within a couple
of %)
there is no foreign key constraint between those two tables
c.id and trc.field are both integers with pretty similar distributions over
1...22 million

** is where it picks a disastrous join plan based on that misstaken-row
estimate
this has to be a close call with doing a merge_join as the other side is
already sorted

* this join is ok, since even if it isn't the fastest join here with the
correct row count, given the index it's not much worse

I can work around this by breaking up the query (e.g. creating a temporary
table of the selected ids, analyzing it then using it in the rest of the
query) or by temporarily disabling nestedloop joins (which makes other
parts of the query slower, but not dramatically so), but is there some
other reasonable proactive way to avoid it? It was running fine for a year
before blowing up (trigger is I suspect the trc table getting enough larger
than the c table, originally it was smaller) and I hit similarish kinds of
issues every so often.

Tim

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2020-09-28 21:06:51 Re: Is it possible to specify minimum number of rows planner should consider?
Previous Message Prince Pathria 2020-09-28 16:09:38 Re: AWS RDS PostgreSQL CPU Spiking to 100%