Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

From: Martin Kováčik <kovacik(at)redbyte(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction
Date: 2019-04-25 20:19:33
Message-ID: CABp97hP+m6NQwr=4u=AH2baRn--LVCAWA9b+Ybu=zigcOGMBXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Tom,

that is a great explanation. The default plan, which overestimates number
of rows is much better than the plan (which is based on wrong stats) that
underestimates it. This is true for my particular case but may be bad for
other cases.

The schema is created before the tests begin so the suggestion to create
required tables inside a transaction isn't feasible for my case. For the
build server I'm going with Michael's suggestion to disable autovacuum.
Instead I'll do manual vacuuming/analyzing before build starts.

For the production use case I'll do my own analyze after bulk insert as you
suggested.

Initially I thought that inside a transaction stats are automatically
updated after the inserts because the query ran much faster than the query
which was based on statistics with no rows. I did not realize the planner
chooses a "default" plan when there are no stats.

*Martin Kováčik*
*CEO*
*redByte*, s.r.o.
+421 904 236 791
kovacik(at)redbyte(dot)eu, www.redbyte.eu <http://redbyte.eu>

On Thu, Apr 25, 2019 at 9:46 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?B?TWFydGluIEtvdsOhxI1paw==?= <kovacik(at)redbyte(dot)eu> writes:
> > To illustrate my situation let's consider my tests look like this:
>
> > BEGIN;
>
> > -- A: insert data for the test
>
> > -- B: i'll refer to this point later
>
> > -- C: select(s)
>
> > ROLLBACK;
>
> > Everything is fine, until autovacuum (analyze) runs when the test is at
> > point B. After that the query planner at point C chooses wrong plan and
> the
> > query takes a long time to complete, blocking one CPU core for a long
> time.
> > It seems like the planner statistics inside running transaction are
> > affected by analyze task running outside of the transaction.
>
> Yup, they are. However, you're already at risk of a pretty bad plan for
> a case like this, since (by assumption) the stats before you did the
> insert at step A are radically different from what they should be after
> the insert.
>
> The standard recommendation, when you need the results of a data change
> to be understood by the planner immediately, is to do your own ANALYZE:
>
> BEGIN;
>
> -- A: insert data for the test
>
> ANALYZE test_table;
>
> -- C: select(s)
>
> ROLLBACK;
>
> This should protect step C against seeing any irrelevant stats, because
>
> (a) once your transaction has done an ANALYZE, autovacuum shouldn't
> think the stats are out of date, and
>
> (b) even if it does, your transaction is now holding
> ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get
> that lock to do a fresh ANALYZE.
>
>
> Another thing you could do is not even allow the test table to exist
> outside your transaction:
>
> BEGIN;
>
> CREATE TABLE test_table (...);
>
> -- A: insert data for the test
>
> ANALYZE test_table; -- this is now somewhat optional
>
> -- C: select(s)
>
> ROLLBACK;
>
> Then there's nothing for auto-ANALYZE to get its hands on. If you're
> satisfied with the planner's default behavior in the absence of any
> stats, you could omit the post-insertion ANALYZE in this case. But
> I'm not sure that that would represent a test that has much to do with
> production situations.
>
> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martin Kováčik 2019-04-25 20:28:51 Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction
Previous Message Adrian Klaver 2019-04-25 19:58:16 Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction