Re: Duplicate WHERE condition changes performance and plan

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: "singh400(at)gmail(dot)com" <singh400(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Duplicate WHERE condition changes performance and plan
Date: 2020-04-20 02:33:57
Message-ID: 20200420023357.GR26953@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Apr 15, 2020 at 08:55:53PM +0100, singh400(at)gmail(dot)com wrote:
> We have an odd issue where specifying the same where clause twice causes PG
> to pick a much more efficent plan. We would like to know why.

> Query B (this is the 'fast' query):
> UPDATE problem_instance SET processed = false
> FROM problem
> WHERE problem.id = problem_instance.problem_id
> AND problem.status != 2
> AND problem.status != 2
> AND processed = true;

When you specify redundant condition, it results in an underestimate, as
expected:

Index Scan using problem_id_idx1 on public.problem (cost=0.28..624.68 ROWS=73 width=14) (actual time=0.011..0.714 ROWS=841 loops=1)
Filter: ((problem.status <> 2) AND (problem.status <> 2))

In this case, doing an index scans on problem_instance is apparently faster
than an seq scan.

I think the duplicate condition is fooling the planner, and by chance it's
giving a better plan. That might indicate that your settings aren't ideal.
Maybe random_page_cost should be lower, which would encourage index scans. If
you're using SSD storage, or if the DB is small compared with shared_buffers or
RAM, then random_page_cost should be closer to seq_page_cost.

How large are the indexes? problem_id_idx1 ?

On Mon, Apr 20, 2020 at 01:50:17AM +0100, singh400(at)gmail(dot)com wrote:
> Even after upgrading my local install of PG to "PostgreSQL 12.2,
> compiled by Visual C++ build 1914, 64-bit" and I'm still seeing the
> same behaviour.

> Server Configuration:
> https://gist.github.com/indy-singh/8386d59206af042d365e5cd49fbae68f
> shared_buffers 2GB configuration file
> effective_cache_size 6GB configuration file

Note, until v10, the documentation said this:

https://www.postgresql.org/docs/9.6/runtime-config-resource.html
|Also, on Windows, large values for shared_buffers aren't as effective. You may
|find better results keeping the setting relatively low and using the operating
|system cache more instead. The useful range for shared_buffers on Windows
|systems is generally from 64MB to 512MB.

It would be interesting to know

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Vincent 2020-04-20 11:10:46 RE: Postgres not using index on views
Previous Message singh400@gmail.com 2020-04-20 00:50:17 Re: Duplicate WHERE condition changes performance and plan