Weird row estimate

From: Marc Cousin <mcousin(at)sigma(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Weird row estimate
Date: 2007-07-11 15:28:43
Message-ID: 200707111728.43416.mcousin@sigma.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm having a weird problem on a query :
I've simplified it to get the significant part (see end of message).
The point is I've got a simple
SELECT field FROM table WHERE 'condition1'
Estimated returned rows : 5453
Then
SELECT field FROM table WHERE 'condition2'
Estimated returned rows : 705
Then
SELECT field FROM table WHERE 'condition1' OR 'condition2'
Estimated returned rows : 143998

Condition2 is a bit complicated (it's a subquery).
Nevertheless, shouldn't the third estimate be smaller or equal to the sum of the two others ?

Postgresql is 8.2.4 on Linux, stats are up to date,
show default_statistics_target;
default_statistics_target
---------------------------
1000

Any ideas ?

explain analyze
SELECT stc.CMD_ID
FROM STOL_STC stc
WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05');

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on stol_stc stc (cost=0.00..24265.15 rows=5453 width=8) (actual time=17.186..100.941 rows=721 loops=1)
Filter: ((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date))
Total runtime: 101.656 ms
(3 rows)

explain analyze
SELECT stc.CMD_ID
FROM STOL_STC stc
WHERE stc.STC_ID IN
(SELECT STC_ID FROM STOL_TRJ
WHERE TRJ_DATEARRT>='2007-07-05'
AND TRJ_DATEDEPT>=TRJ_DATEARRT
AND (TRJ_DATEDEPT<='2007-07-05'
OR TRJ_DATECREAT<='2007-07-05') );

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4649.62..10079.52 rows=705 width=8) (actual time=6.266..13.037 rows=640 loops=1)
-> HashAggregate (cost=4649.62..4657.13 rows=751 width=8) (actual time=6.242..6.975 rows=648 loops=1)
-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.055..4.901 rows=688 loops=1)
Index Cond: (trj_datearrt >= '2007-07-05'::date)
Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))
-> Index Scan using stol_stc_pk on stol_stc stc (cost=0.00..7.21 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=648)
Index Cond: (stc.stc_id = stol_trj.stc_id)
Total runtime: 13.765 ms
(8 rows)

explain analyze
SELECT stc.CMD_ID
FROM STOL_STC stc
WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05')
OR
(stc.STC_ID IN
(SELECT STC_ID FROM STOL_TRJ
WHERE TRJ_DATEARRT>='2007-07-05'
AND TRJ_DATEDEPT>=TRJ_DATEARRT
AND (TRJ_DATEDEPT<='2007-07-05'
OR TRJ_DATECREAT<='2007-07-05') ));

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on stol_stc stc (cost=4649.62..29621.12 rows=143998 width=8) (actual time=21.564..146.365 rows=1048 loops=1)
Filter: (((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date)) OR (hashed subplan))
SubPlan
-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.054..4.941 rows=688 loops=1)
Index Cond: (trj_datearrt >= '2007-07-05'::date)
Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))
Total runtime: 147.407 ms

SELECT count(*) from stol_stc ;
count
--------
140960
(1 row)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2007-07-11 15:33:37 Re: best use of an EMC SAN
Previous Message smiley2211 2007-07-11 15:21:40 TIMING A QUERY ???