From: | Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: demystifying nested loop vs. merge join query plan choice |
Date: | 2013-07-31 19:16:20 |
Message-ID: | CAAywg7tcCGiDqNWBPdYxk4a7RGSVES2j4QJA461BU4qs=cC4nA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Pavel,
Yes. The postgresql.conf is exactly the same. The have the same index and
clustering and are on the same compute node as well but running on
different ports.
-Sandeep
On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:
> Hello
>
> do you have same configuration?
>
> Regards
>
> Pavel
>
> 2013/7/31 Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>:
> > I have two postgres instances each with a database of same schema. The
> > dataset in both is ''same'' but for randomness i.e. both contain two
> tables
> > pc(did) and tc(pid, did) that have almost
> > same number of rows and have been generate from same distribution.
> >
> > However the query plan for the join turns out to be completely
> different: on
> > one join takes 2.3 secs while on the other it takes 7 secs.
> >
> >
> > Here are the statistics:
> >
> > for database 1:
> > size of tc table: 49987585
> > size of pc table: 499616
> >
> > join plan:
> >
> > QUERY PLAN
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> > time=8473.296..8473.296 rows=1 loops=1)
> > -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> > time=0.084..8409.065 rows=998038 loops=1)
> > Merge Cond: (pc.did = tc.did)
> > -> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04
> > rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
> > Heap Fetches: 0
> > -> Index Only Scan using tc_did_idx on tc
> (cost=0.00..1298125.32
> > rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291
> loops=1)
> > Heap Fetches: 0
> > Total runtime: 8473.337 ms
> > '
> >
> > Query Running time: 5135
> >
> >
> > for database 2:
> > size of tc table: 50012415
> > size of pc table: 500384
> >
> > QUERY
> > PLAN
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> > time=2501.970..2501.970 rows=1 loops=1)
> > -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> > time=0.038..2418.766 rows=1000834 loops=1)
> > -> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56
> > rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
> > Heap Fetches: 500384
> > -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44
> > rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
> > Index Cond: (did = pc.did)
> > Heap Fetches: 1000834
> > Total runtime: 2502.017 ms
> >
> > Query running time: 2090.388 ms
> >
> > My question is why is the query plan so different for two datasets that
> are
> > really exactly the same. And how can i force the plan to be nested index
> > scan on
> > database 1 .
> >
> >
> > -Sandeep
> >
>
From | Date | Subject | |
---|---|---|---|
Next Message | Sandeep Gupta | 2013-07-31 19:34:17 | Re: demystifying nested loop vs. merge join query plan choice |
Previous Message | Pavel Stehule | 2013-07-31 19:14:03 | Re: demystifying nested loop vs. merge join query plan choice |