Re: demystifying nested loop vs. merge join query plan choice

From: Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: demystifying nested loop vs. merge join query plan choice
Date: 2013-07-31 19:34:17
Message-ID: CAAywg7v_57K5UYxcrsTSpTWscQA7O1R3Sp2znzHg_nVsMd7i=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

details regarding buffer usage:

for database 1:

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual
time=9149.366..9149.366 rows=1 loops=1)
Buffers: shared hit=137991
-> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual
time=0.091..9075.008 rows=998038 loops=1)
Merge Cond: (pc.did = tc.did)
Buffers: shared hit=137991
-> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04
rows=499616 width=4) (actual time=0.017..58.237 rows=499616 loops=1)
Heap Fetches: 0
Buffers: shared hit=1369
-> Index Only Scan using tc_did_idx on tc (cost=0.00..1298125.32
rows=49987616 width=4) (actual time=0.015..5301.727 rows=49997291 loops=1)
Heap Fetches: 0
Buffers: shared hit=136622
Total runtime: 9149.414 ms
(12 rows)

for database 2:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual
time=2386.865..2386.865 rows=1 loops=1)
Buffers: shared hit=2235978 read=208446
-> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual
time=0.049..2292.338 rows=1000834 loops=1)
Buffers: shared hit=2235978 read=208446
-> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56
rows=500384 width=4) (actual time=0.016..108.407 rows=500384 loops=1)
Heap Fetches: 500384
Buffers: shared hit=6 read=3579
-> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44
rows=3 width=4) (actual time=0.003..0.004 rows=2 loops=500384)
Index Cond: (did = pc.did)
Heap Fetches: 1000834
Buffers: shared hit=2235972 read=204867
Total runtime: 2386.914 ms
(12 rows)

On Wed, Jul 31, 2013 at 3:16 PM, Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>wrote:

> Hi Pavel,
>
> Yes. The postgresql.conf is exactly the same. The have the same index
> and clustering and are on the same compute node as well but running on
> different ports.
>
> -Sandeep
>
>
>
> On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:
>
>> Hello
>>
>> do you have same configuration?
>>
>> Regards
>>
>> Pavel
>>
>> 2013/7/31 Sandeep Gupta <gupta(dot)sandeep(at)gmail(dot)com>:
>> > I have two postgres instances each with a database of same schema. The
>> > dataset in both is ''same'' but for randomness i.e. both contain two
>> tables
>> > pc(did) and tc(pid, did) that have almost
>> > same number of rows and have been generate from same distribution.
>> >
>> > However the query plan for the join turns out to be completely
>> different: on
>> > one join takes 2.3 secs while on the other it takes 7 secs.
>> >
>> >
>> > Here are the statistics:
>> >
>> > for database 1:
>> > size of tc table: 49987585
>> > size of pc table: 499616
>> >
>> > join plan:
>> >
>> > QUERY PLAN
>> >
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>> > Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual
>> > time=8473.296..8473.296 rows=1 loops=1)
>> > -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual
>> > time=0.084..8409.065 rows=998038 loops=1)
>> > Merge Cond: (pc.did = tc.did)
>> > -> Index Only Scan using pc_did_idx on pc
>> (cost=0.00..12987.04
>> > rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
>> > Heap Fetches: 0
>> > -> Index Only Scan using tc_did_idx on tc
>> (cost=0.00..1298125.32
>> > rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291
>> loops=1)
>> > Heap Fetches: 0
>> > Total runtime: 8473.337 ms
>> > '
>> >
>> > Query Running time: 5135
>> >
>> >
>> > for database 2:
>> > size of tc table: 50012415
>> > size of pc table: 500384
>> >
>> > QUERY
>> > PLAN
>> >
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>> > Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual
>> > time=2501.970..2501.970 rows=1 loops=1)
>> > -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0)
>> (actual
>> > time=0.038..2418.766 rows=1000834 loops=1)
>> > -> Index Only Scan using pc_did_idx on pc
>> (cost=0.00..15224.56
>> > rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
>> > Heap Fetches: 500384
>> > -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44
>> > rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
>> > Index Cond: (did = pc.did)
>> > Heap Fetches: 1000834
>> > Total runtime: 2502.017 ms
>> >
>> > Query running time: 2090.388 ms
>> >
>> > My question is why is the query plan so different for two datasets that
>> are
>> > really exactly the same. And how can i force the plan to be nested index
>> > scan on
>> > database 1 .
>> >
>> >
>> > -Sandeep
>> >
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-07-31 19:55:44 Re: demystifying nested loop vs. merge join query plan choice
Previous Message Sandeep Gupta 2013-07-31 19:16:20 Re: demystifying nested loop vs. merge join query plan choice