Re: Analyze results in more expensive query plan

From: Jeremy Altavilla <jeremyaltavilla(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Analyze results in more expensive query plan
Date: 2019-05-22 22:21:38
Message-ID: CAG3Nut8TY18vt8yN8iNymebhW7yZWUQVbFS209=j-TOymhn-jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the help. In our prod environment, we shouldn't be
planning unnecessarily. Our app uses the extended query protocol
(prepare/bind/exec) to call pg/plsql stored procedures. I left out a
lot of context and background in my question, because I hoped it
simplified things. I might have left out too much though. Immediately
after we upgraded our prod database from postgres 9.6 to 10, it
started using 2-3x more cpu with no change in requests per second. We
have a load test for this app / database; using it we eventually
discovered the effect of having stats on the bag_type table. After
that, It made sense that the upgrade triggered this, as a step in the
upgrade process is to run analyze new cluster.

I experimented with changing the per column statistics value. Setting
name and game to 0, results in no stats for those columns, and the
planner choosing the better plan. Pretty much any other set of values
resulted in the more expensive plan. I'm not sure if this is fixing
the problem, or hiding the problem, but it's definitely less fragile
than hoping the table never gets analyzed.

--Thanks
--Jeremy

On Mon, May 20, 2019 at 10:04 PM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 21 May 2019 at 08:23, Jeremy Altavilla
> <jeremyaltavilla(at)gmail(dot)com> wrote:
> >
> > Thanks for the suggestion. I created extended statistics objects for the two tables in question. Unfortunately the resulting plan was the same (and had the same estimates). It looks like the extended stats discovered a potentially useful correlation on bag: "2, 3 => 1" (owner_id, bag_type_id => id). I'm guessing this wasn't usable because the docs state "They are not used to improve estimates for equality conditions comparing two columns".
>
> I'd say that since the time spent planning is near 3x what is spent
> during execution that you're wasting your time trying to speed up the
> execution. What you should be thinking about is using PREPAREd
> statements to avoid the planning overhead completely. If that's not
> possible then you've more chance of reducing the time spent planning
> by reducing the statistics on the table rather than adding more
> planning overhead by adding extended stats. You might want to
> experiment with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ..;
> and setting those down a bit then analyzing the tables again.
> Although, that's likely only going to make a very small difference, if
> any, than getting rid of the planning completely.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message George Neuner 2019-05-23 05:08:42 Re: Use Postgres as a column store by creating one table per column
Previous Message Tomas Vondra 2019-05-22 17:00:51 Re: pg_restore takes more time on creation of rules