Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes

From: Arjun Ranade <ranade(at)nodalexchange(dot)com>
To: pryzby(at)telsasoft(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes
Date: 2018-09-27 19:37:57
Message-ID: CANrrCRy_HSsYCTDVH+EtDoBJzFiEob3Nd5bztmno5V13WFWOjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yes, that join is concerning (red text below). The conditions all need to
be checked so they are independent.

The query (with consistent obfuscation) is below :

select distinct
a.sale_id
, a.test_date
, a.product_id as original_product_id
,vw2.product_id
, a.volume as volume
,b.pair_rank
from not_sold_locations a
inner join vw_product vw2 using
(product_group_name,product_class_code,product_type_code,sale_end_date)
inner join product_mapping b on a.product_group_name =
b.left_product_group_name and
a.product_node_name = b.left_product_node and
a.product_type_code = b.left_product and
vw2.product_node_name = b.right_product_node and
vw2.product_group_name =
b.right_product_group_name and
vw2.product_type_code = b.right_product
inner join mapping_ref i on vw2.product_group_name || '.' ||
vw2.product_node_name = i.product_node_name and
vw2.product_class_code = i.product_class_code and
vw2.product_type_code = i.product_type_code and
vw2.sale_end_date between i.first_product_date
and i.last_product_date;

not_sold_locations(a) has 836 rows
vw_product (vw2) has 785k rows and is a view that joins 11 tables
together to have a consolidated view of all products, sales locations,
etc

product_mapping (b) has 2520 rows

mapping_ref (i) has 178 rows

On Thu, Sep 27, 2018 at 2:52 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Thu, Sep 27, 2018 at 01:08:05PM -0400, Arjun Ranade wrote:
> > When I look at the EXPLAIN ANALYZE output, it seems that it's using a
> > drastically different query plan for the INSERT+SELECT than SELECT by
> > itself.
>
> The fast, SELECT plan is using parallel query, which isn't available for
> INSERT+SELECT:
>
>
> https://www.postgresql.org/docs/current/static/when-can-parallel-query-be-used.html
> |Even when it is in general possible for parallel query plans to be
> generated, the planner will not generate them for a given query if any of
> the following are true:
> |The query writes any data or locks any database rows.
>
> Using parallel query in this case happens to mitigate the effects of the
> bad
> plan.
>
> I see Tom responded, and you got an improvement by changing join threshold.
>
> But I think you could perhaps get an better plan if the rowcount estimates
> were
> fixed. That's more important than probably anything else - changing
> settings
> is only a workaround for bad estimates.
>
> In the slow/INSERT plan, this join is returning 55000x more rows than
> expected
> (not 55k more: 55k TIMES more).
>
> 7. 26,937.132 401,503.136 ↓ 55,483.7 332,902 1
>
> Nested Loop (cost=1,516.620..42,244.240 rows=6 width=84) (actual
> time=311.021..401,503.136 rows=332,902 loops=1)
> Join Filter: (((papa_echo.oscar_bravo)::text =
> (five_hotel.tango_november)::text) AND ((papa_echo.lima_tango)::text =
> (five_hotel.lima_mike)::text) AND ((xray_juliet1.juliet)::text =
> (five_hotel.papa_victor)::text))
> Rows Removed by Join Filter: 351664882
> Buffers: shared hit=8570619 read=6
>
> First question is if all those conditions are independent? Or if one of
> those
> conditions also implies another, which is confusing the planner.
>
> Justin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Arjun Ranade 2018-09-27 19:51:32 Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes
Previous Message Justin Pryzby 2018-09-27 19:33:14 Re: SELECT statement returns in 10seconds, but INSERT/CREATE TABLE AS with same SELECT takes 7 minutes