From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | <fburgess(at)radiantblue(dot)com>, "'Jeff Janes'" <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "'Jaime Casanova'" <jaime(at)2ndquadrant(dot)com>, "'psql performance list'" <pgsql-performance(at)postgresql(dot)org>, "'Postgres General'" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Very slow inner join query Unacceptable latency. |
Date: | 2013-05-24 05:14:44 |
Message-ID: | 005b01ce583d$9b4edfd0$d1ec9f70$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
On Thursday, May 23, 2013 10:51 PM fburgess wrote:
> serverdb=# set enable_hashjoin=off;
> SET
> serverdb=# explain select count(*) as y0_ from SARS_ACTS this_ inner join SARS_ACTS_RUN tr1_ on this_.SARS_RUN_ID=tr1_.ID where tr1.ALGORITHM='SMAT';
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7765563.69..7765563.70 rows=1 width=0)
> Nested Loop (cost=0.00..7765555.35 rows=3336 width=0)
> -> Index Scan using idx_sars_acts_run_algorithm on sars_acts_run tr1_ (cost=0.00..44.32 rows=650 width=8)
> Index Cond: ((algorithm)::text = 'SMAT'::text)
> -> Index Scan using idx_sars_acts_run_id_end_time on sars_acts this_ (cost=0.00..11891.29 rows=4452 width=8)
> Index Cond: (SARS_RUN_ID=tr1_.ID)
>(6 rows)
>serverdb=# \timing
>TIming is on.
>serverdb=# select count(*) as y0_ from SARS_ACTS this_ inner join SARS_ACTS_RUN tr1_ on this_.SARS_RUN_ID=tr1_.ID where tr1.ALGORITHM='SMAT';
> y0_
>------
>1481710
>(1 row)
> Time: 85069.416 ms < 1.4 minutes <-- not great, but much better!
> Subsequently, runs in the milliseconds once cached.
If I see the plan from your other mail as below where Hash join is selected, the cost of Nested Loop is much more, that is the reason why optimizer would have selected
Hash Join.
serverdb=# explain analyze select count(*) as y0_ from SARS_ACTS this_ inner join SARS_ACTS_RUN tr1_ on this_.SARS_RUN_ID=tr1_.ID where tr1_.ALGORITHM='SMAT';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3983424.05..3983424.06 rows=1 width=0) (actual time=1358298.003..1358298.004 rows=1 loops=1)
-> Hash Join (cost=44.93..3983415.81 rows=3297 width=0) (actual time=2593.768..1358041.205 rows 1481710 loops=1)
It is quite surprising that after optimizer decided the cost of some plan (Hash Join) to be lower but actual execution cost of same is more.
There might be some problem with cost calculation model of Hash Join for some cases.
By the way which version of PostgreSQL you are using?
> But what negative impact is disabling hash joins?
I think using it as a temporary fix might be okay, but keeping such code in your application might be risky for you, because as the data changes in your tables, it could be quite possible that
in future Hash Join might be the best and cheapest way.
Can you try reproducing it with small data or else can you attach your schema and data for the tables/indexes used in query?
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Jov | 2013-05-24 05:47:44 | Re: There are many processes created and died frequently during PostgreSQl idle |
Previous Message | Mike Christensen | 2013-05-24 02:57:14 | Re: Success stories of PostgreSQL implementations in different companies |
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2013-05-24 06:16:16 | Re: [PERFORM] Very slow inner join query Unacceptable latency. |
Previous Message | Jonathan Morra | 2013-05-24 00:21:00 | Re: Performance of complicated query |