Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes

From: Arjun Ranade <ranade(at)nodalexchange(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes
Date: 2018-09-27 18:58:28
Message-ID: CANrrCRznE7xhi8_c=u=E14+1++D6YwHABPqSE4n3FrL6HrhftA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom,

Thank you for your suggestions. I tried increasing from_collapse_limit and
join_collapse_limit to 16 in a specific session and that significantly
improved my query performance (it takes < 2s now). Now, my instinct is to
increase this globally but I'm sure there are some drawbacks to this so I
will need to read more about it.

Your point about parallelism is interesting, I hadn't considered that.

Even after working with Postgres for years, there really is a lot to learn
about query optimization that is new for me. I'd never heard of these
parameters before your email since almost every performance issue I've had
thus far was resolved by creating an index or smarter query re-writing.

I'm reading the documentation regarding these specific parameters, but it's
written as a reference page as opposed to an explanation into query
planning and optimization. I wonder if there is a class or book these
details better.

Anyway, thank you so much for pointing me in the right direction.

Best,
Arjun

On Thu, Sep 27, 2018 at 1:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Arjun Ranade <ranade(at)nodalexchange(dot)com> writes:
> > I have a strange performance situation that I cannot resolve with my
> usual
> > process.
> > I have a SELECT statement that completes in about 12 seconds for the full
> > result (~1100 rows).
> > If I create an empty table first, and then INSERT with the SELECT query,
> it
> > takes 6.5 minutes.
>
> > When I look at the EXPLAIN ANALYZE output, it seems that it's using a
> > drastically different query plan for the INSERT+SELECT than SELECT by
> > itself.
>
> The reason for the plan shape difference is probably that the bare SELECT
> is allowed to use parallelism while INSERT/SELECT isn't. I'm not sure
> to what extent we could relax that without creating semantic gotchas.
>
> However, your real problem with either query is that the planner's
> rowcount estimates are off by several orders of magnitude. If you could
> improve that, you'd likely get better plan choices in both cases.
>
> I also notice that this seems to be a 14-way join, which means you're
> probably getting an artificially poor plan as a result of
> from_collapse_limit and/or join_collapse_limit constraining the planner's
> search space. Maybe raising those limits would help, although I'm not
> sure how much it'd help if the rowcount estimates aren't improved.
>
> Since you haven't told us much of anything about the actual query or the
> data, it's hard to offer concrete advice beyond that.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-09-27 19:33:14 Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes
Previous Message Justin Pryzby 2018-09-27 18:52:24 Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes