Re: demystifying nested loop vs. merge join query plan choice

From: Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: demystifying nested loop vs. merge join query plan choice
Date: 2013-07-31 19:16:20
Message-ID: CAAywg7tcCGiDqNWBPdYxk4a7RGSVES2j4QJA461BU4qs=cC4nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Pavel,

Yes. The postgresql.conf is exactly the same. The have the same index and
clustering and are on the same compute node as well but running on
different ports.

-Sandeep

On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> Hello
>
> do you have same configuration?
>
> Regards
>
> Pavel
>
> 2013/7/31 Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>:
> > I have two postgres instances each with a database of same schema. The
> > dataset in both is ''same'' but for randomness i.e. both contain two
> tables
> > pc(did) and tc(pid, did) that have almost
> > same number of rows and have been generate from same distribution.
> >
> > However the query plan for the join turns out to be completely
> different: on
> > one join takes 2.3 secs while on the other it takes 7 secs.
> >
> >
> > Here are the statistics:
> >
> > for database 1:
> > size of tc table: 49987585
> > size of pc table: 499616
> >
> > join plan:
> >
> > QUERY PLAN
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> > time=8473.296..8473.296 rows=1 loops=1)
> > -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> > time=0.084..8409.065 rows=998038 loops=1)
> > Merge Cond: (pc.did = tc.did)
> > -> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04
> > rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
> > Heap Fetches: 0
> > -> Index Only Scan using tc_did_idx on tc
> (cost=0.00..1298125.32
> > rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291
> loops=1)
> > Heap Fetches: 0
> > Total runtime: 8473.337 ms
> > '
> >
> > Query Running time: 5135
> >
> >
> > for database 2:
> > size of tc table: 50012415
> > size of pc table: 500384
> >
> > QUERY
> > PLAN
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> > time=2501.970..2501.970 rows=1 loops=1)
> > -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> > time=0.038..2418.766 rows=1000834 loops=1)
> > -> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56
> > rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
> > Heap Fetches: 500384
> > -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44
> > rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
> > Index Cond: (did = pc.did)
> > Heap Fetches: 1000834
> > Total runtime: 2502.017 ms
> >
> > Query running time: 2090.388 ms
> >
> > My question is why is the query plan so different for two datasets that
> are
> > really exactly the same. And how can i force the plan to be nested index
> > scan on
> > database 1 .
> >
> >
> > -Sandeep
> >
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sandeep Gupta 2013-07-31 19:34:17 Re: demystifying nested loop vs. merge join query plan choice
Previous Message Pavel Stehule 2013-07-31 19:14:03 Re: demystifying nested loop vs. merge join query plan choice