Re: query of partitioned object doesnt use index in qa

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query of partitioned object doesnt use index in qa
Date: 2017-09-15 20:59:17
Message-ID: CAB9893g-1fpvh=0snbe7qFJKfXEsn2YxR3ZWZ6-JxrMCyaZg3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

That makes a lot of sense, thanks for taking a look. An index like you
suggest would probably further improve the query. Is that suggestion
sidestepping the original problem that production is evaluating the
landing_id bit with the partition index and qa is sequence scanning instead?

AND exists (select 1 from t_sap where e.landing_id = t_sap.landing_id)) as
rankings;

Based on the difference in row estimate I am attempting an analyze with a
higher default_statistic_target (currently 100) to see if that helps.

On Fri, Sep 15, 2017 at 3:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Mike Broers <mbroers(at)gmail(dot)com> writes:
> > If Im reading this correctly postgres thinks the partition will return
> 6.5
> > million matching rows but actually comes back with 162k. Is this a case
> > where something is wrong with the analyze job?
>
> You've got a lot of scans there that're using conditions like
>
> > │ -> Seq Scan on event__99999999 e_1
> (cost=0.00..2527828.05 rows=11383021 width=778) (actual
> time=25522.389..747238.885 rows=42 loops=1)
> > │ Filter: (((body ->> 'SID'::text) IS
> NOT NULL) AND (validation_status_code = 'P'::bpchar))
> > │ Rows Removed by Filter: 12172186
>
> While I'd expect the planner to be pretty solid on estimating the
> validation_status_code condition, it's not going to have any idea about
> that JSON field test. That's apparently very selective, but you're just
> getting a default estimate, which is not going to think that a NOT NULL
> test will exclude lots of rows.
>
> One thing you could consider doing about this is creating an index
> on (body ->> 'SID'::text), which would prompt ANALYZE to gather statistics
> about that expression. Even if the index weren't actually used in the
> plan, this might improve the estimates and the resulting planning choices
> enough to make it worth maintaining such an index.
>
> Or you could think about pulling that field out and storing it on its own.
> JSON columns are great for storing random unstructured data, but they are
> less great when you want to do relational-ish things on subfields.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2017-09-15 21:03:58 Re: query of partitioned object doesnt use index in qa
Previous Message Jeremy Finzel 2017-09-15 20:51:01 Indexing an array of two separate columns