Re: query of partitioned object doesnt use index in qa

From: Mike Broers <mbroers(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query of partitioned object doesnt use index in qa
Date: 2017-09-25 16:21:41
Message-ID: CAB9893gy7GC3S-_4raZ6b=Mpxn+j37X8j9hacwiicENsEOTojw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Very helpful thank you for the additional insight - I'd never checked into
pg_stats and that does reveal a difference in the distribution of the
validation_status_code between qa and production:

prod:
│ most_common_vals │ {P,F} │
│ most_common_freqs │ {0.925967,0.000933333} │
│ histogram_bounds │ ❏ │
│ correlation │ 0.995533 │

qa:
│ most_common_vals │ {P} │
│ most_common_freqs │ {0.861633} │
│ histogram_bounds │ ❏ │
│ correlation │ 0.999961 │

so the way I am reading this is that there is likely no sensible way to
avoid postgres thinking it will just have to scan the whole table because
of these statistics. I can force it by setting session parameters for this
particular query but I probably shouldnt be looking at system settings to
brutally force random fetches.

thanks again for the assistance!

On Wed, Sep 20, 2017 at 6:05 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 21 September 2017 at 04:15, Mike Broers <mbroers(at)gmail(dot)com> wrote:
> > Ultimately I think this is just highlighting the need in my environment
> to
> > set random_page_cost lower (we are on an SSD SAN anyway..), but I dont
> think
> > I have a satisfactory reason by the row estimates are so bad in the QA
> > planner and why it doesnt use that partition index there.
>
> Without the index there are no stats to allow the planner to perform a
> good estimate on "e.body->>'SID' is not null", so it applies a default
> of 99.5%. So, as a simple example, if you have a partition with 1
> million rows. If you apply 99.5% to that you get 995000 rows. Now if
> you add the selectivity for "e.validation_status_code = 'P' ", let's
> say that's 50%, the row estimate for the entire WHERE clause would be
> 497500 (1000000 * 0.995 * 0.5). Since the 99.5% is applied in both
> cases, then the only variable part is validation_status_code. Perhaps
> validation_status_code = 'P' is much more common in QA than in
> production.
>
> You can look at the stats as gathered by ANALYZE with:
>
> \x on
> select * from pg_stats where tablename = 'event__99999999' and attname
> = 'validation_status_code';
> \x off
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Subramaniam C 2017-09-28 08:19:30 Slow query in JDBC
Previous Message Tobias Gierke 2017-09-25 11:10:56 Parallel sequential scan not supported for stored procedure with RETURN QUERY EXECUTE ?