| From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> | 
|---|---|
| To: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | left outer join vs subplan | 
| Date: | 2007-09-05 16:11:23 | 
| Message-ID: | 46DED52B.1000609@sigaev.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi!
I found two queries which do the same thing but they is very different in time. 
For test suite it's about 10^3 times, but on real data it can be 10^5 times. 
It's observed on 8.1-current, 8.2-current and CVS HEAD versions. Interesting 
that even without LIMIT clause they take approximately the same time, but costs 
is differ in 30 times. Is any way to tweaking pgsql to produce more reasonable 
plan for first query?
This query is auto-generated, so they may be more complex and I choose simplest 
example.
First query:
explain analyze
select *
from
     a
     left outer join (
         select b.id, sum(b.val)
         from b
         group by b.id
     ) bagg
         on bagg.id = a.id
where
     a.id > 10000
order by a.addon, a.id
limit 100;
  Limit  (cost=9923.36..9923.61 rows=100 width=20) (actual 
time=2232.437..2233.273 rows=100 loops=1)
    ->  Sort  (cost=9923.36..10031.41 rows=43221 width=20) (actual 
time=2232.428..2232.709 rows=100 loops=1)
          Sort Key: a.addon, a.id
          Sort Method:  top-N heapsort  Memory: 24kB
          ->  Merge Right Join  (cost=0.00..8271.48 rows=43221 width=20) (actual 
time=313.198..2052.559 rows=40000 loops=1)
                Merge Cond: (b.id = a.id)
                ->  GroupAggregate  (cost=0.00..5725.41 rows=53292 width=12) 
(actual time=0.266..1422.522 rows=50000 loops=1)
                      ->  Index Scan using bidx on b  (cost=0.00..4309.26 
rows=150000 width=12) (actual time=0.217..547.402 rows=150000 loops=1)
                ->  Index Scan using a1idx on a  (cost=0.00..1256.90 rows=40551 
width=8) (actual time=0.171..155.073 rows=40000 loops=1)
                      Index Cond: (a.id > 10000)
  Total runtime: 2233.940 ms
Second query:
explain analyze
select
     a.id,
     (
         select sum(b.val)
         from b
         where b.id = a.id
     ) as val
from a
where
     id > 10000
order by a.addon, a.id
limit 100;
  Limit  (cost=0.00..839.04 rows=100 width=8) (actual time=0.339..7.436 rows=100 
loops=1)
    ->  Index Scan using a2idx on a  (cost=0.00..340241.08 rows=40551 width=8) 
(actual time=0.332..6.865 rows=100 loops=1)
          Index Cond: (id > 10000)
          SubPlan
            ->  Aggregate  (cost=8.33..8.34 rows=1 width=8) (actual 
time=0.048..0.051 rows=1 loops=100)
                  ->  Index Scan using bidx on b  (cost=0.00..8.32 rows=3 
width=8) (actual time=0.016..0.027 rows=3 loops=100)
                        Index Cond: (id = $0)
How to reproduce:
select generate_series as id, (random()*100)::int as addon into a
     from generate_series(1,50000);
create unique index a1idx on a (id);
create unique index a2idx on a (addon, id);
select
     id, random() as val into b
     from generate_series(1,50000) as id , generate_series(1,3) as foo;
create index bidx on b (id);
vacuum analyze;
-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Decibel! | 2007-09-05 16:23:46 | Oddity with psql \d and pg_table_is_visible | 
| Previous Message | Tom Lane | 2007-09-05 16:01:48 | Re: HEAD build troubles, buildfarm misconfigurations |