From: | Andrew Sullivan <andrew(at)libertyrms(dot)info> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | strange estimate for number of rows |
Date: | 2003-11-13 18:14:15 |
Message-ID: | 20031113181415.GB25546@libertyrms.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi all,
I've one here that I cannot fathom. Any suggestions?
We have a table, call it tablename, where we're selecting by a range
of dates and an identifier. (This is redacted, obviously):
\d tablename
Column | Type | Modifiers
--------------------+--------------------------+--------------------
id | integer | not null
transaction_date | timestamp with time zone | not null
product_id | integer | not null
Indexes:
"trans_posted_trans_date_idx" btree (transaction_date, product_id)
The statistics on transaction_date and product_id are set to 1000.
Everything is all analysed nicely. But I'm getting a poor plan,
because of an estimate that the number of rows to be returned is
about double how many actually are:
explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on transactions_posted (cost=0.00..376630.33 rows=700923
width=91) (actual time=8422.253..36176.078 rows=316029 loops=1)
Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp
with time zone) AND (transaction_date < '2003-10-01
00:00:00-04'::timestamp with time zone) AND (product_id = 2))
Total runtime: 36357.630 ms
(3 rows)
SET enable_seqscan = off;
explain analyse select * from transactions_posted where
transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and
product_id = 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using trans_posted_trans_date_idx on transactions_posted
(cost=0.00..1088862.56 rows=700923 width=91) (actual
time=35.214..14816.257 rows=316029 loops=1)
Index Cond: ((transaction_date >= '2003-09-01
00:00:00-04'::timestamp with time zone) AND (transaction_date <
'2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id =
2))
Total runtime: 15009.816 ms
(3 rows)
SELECT attname,null_frac,avg_width,n_distinct,correlation FROM
pg_stats where tablename = 'transactions_posted' AND attname in
('transaction_date','product_id');
attname | null_frac | avg_width | n_distinct | correlation
------------------+-----------+-----------+------------+-------------
product_id | 0 | 4 | 2 | 0.200956
transaction_date | 0 | 8 | -0.200791 | 0.289248
Any ideas? I'm loathe to recommend cluster, since the data will not
stay clustered.
A
--
----
Andrew Sullivan 204-4141 Yonge Street
Afilias Canada Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Ward | 2003-11-13 18:28:15 | Union+group by planner estimates way off? |
Previous Message | Robert Treat | 2003-11-13 17:19:42 | Re: IN surpasses NOT EXISTS in 7.4RC2 ?? |