Quick Links

Re: strange estimate for number of rows

From:	Daniel Manley <dmanley(at)libertyrms(dot)info>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: strange estimate for number of rows
Date:	2003-11-13 19:35:58
Message-ID:	3FB3DD1E.2000509@libertyrms.info
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi, I'm the lead developer on the project this concerns (forgive my
newbiness on this list).

We tried a couple of scenarios with effective_cache_size=60000,
cpu_index_tuple_cost=0.0001 and random_page_cost=2 with no change in the
plan.

explain analyse select * from tablename where transaction_date >=
'2003-9-1' and transaction_date < '2003-10-1';
-----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tablename (cost=0.00..348199.14 rows=1180724 width=91)
(actual time=7727.668..36286.898 rows=579238 loops=1)
Filter: ((transaction_date >= '2003-09-01 00:00:00+00'::timestamp
with time zone) AND (transaction_date < '2003-10-01
00:00:00+00'::timestamp with time zone))
Total runtime: 36625.351 ms

explain analyse select * from transactions_posted where product_id = 2;
-----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on transactions_posted (cost=0.00..319767.95 rows=6785237
width=91) (actual time=0.091..35596.328 rows=5713877 loops=1)
Filter: (product_id = 2)
Total runtime: 38685.373 ms

The product_id alone gives a difference of a millions rows from estimate
to actual, vs. the factor of 2 from the transaction_date.

Dan Manley

Tom Lane пишет:

>Andrew Sullivan <andrew(at)libertyrms(dot)info> writes:
>
>
>>The statistics on transaction_date and product_id are set to 1000.
>>Everything is all analysed nicely. But I'm getting a poor plan,
>>because of an estimate that the number of rows to be returned is
>>about double how many actually are:
>>
>>
>
>
>
>>explain analyse select * from transactions_posted where
>>transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
>>product_id = 2;
>>
>>
>
>Are the estimates accurate for queries on the two columns individually,
>ie
>... where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1'
>... where product_id = 2
>
>If so, the problem is that there's a correlation between
>transaction_date and product_id, which the system cannot model because
>it has no multi-column statistics.
>
>However, given that the estimate is only off by about a factor of 2,
>you'd still be getting the wrong plan even if the estimate were perfect,
>because the estimated costs differ by nearly a factor of 3.
>
>Given the actual runtimes, I'm thinking maybe you want to reduce
>random_page_cost. What are you using for that now?
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>

In response to

Re: strange estimate for number of rows at 2003-11-13 18:56:26 from Tom Lane

Responses

Re: strange estimate for number of rows at 2003-11-13 20:19:08 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2003-11-13 19:36:05	Storage space, RAM for NUMERIC
Previous Message	Arthur Ward	2003-11-13 19:19:07	Re: Union+group by planner estimates way off?