Re: Bad selectivity estimate when using a sub query to determine WHERE condition

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Chris Borckholder <chris(dot)borckholder(at)bitpanda(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bad selectivity estimate when using a sub query to determine WHERE condition
Date: 2020-02-10 19:13:04
Message-ID: 20200210191304.GF1412@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Feb 10, 2020 at 11:34:01AM +0100, Chris Borckholder wrote:
> I have a large table of immutable events that need to be aggregated
> regularly to derive statistics. To improve the performance, that table is
> rolled up every 15minutes, so that online checks can aggregate rolled up
> data and combine it with latest events created after the last roll up.
>
> To implement this a query is executed that selects only events after the
> time of the last rollup.
> That time is determined dynamically based on a log table.

Perhaps that could be done as an indexed column in the large table, rather
than querying a 2nd log table.
Possibly with a partial index on that column: WHERE unprocessed='t'.

> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.
>
> I analyzed the involved tables and increased the collected histogram, but
> the query plan did not improve. Is there a way to help the query planner
> recognize this in the dynamic case?

Also, if you used partitioning with pgostgres since v11, then I think most
partitions would be excluded:

https://www.postgresql.org/docs/12/release-12.html
|Allow partition elimination during query execution (David Rowley, Beena Emerson)
|Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5

https://www.postgresql.org/about/featurematrix/detail/332/

Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Amol Tarte 2020-02-12 05:28:15 Re: Writing 1100 rows per second
Previous Message Pavel Stehule 2020-02-10 18:47:35 Re: proposal: schema variables