From: | "David Witham" <davidw(at)unidial(dot)com(dot)au> |
---|---|
To: | <pgsql-sql(at)postgresql(dot)org> |
Subject: | Indexes and statistics |
Date: | 2004-02-18 04:43:07 |
Message-ID: | CFA248776934FD43847E740E43C346D199DC13@ozimelb03.ozicom.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi all,
This is a further post from last week. I've got a table of phone call detail records.
buns=# select count(*) from cdr;
count
---------
2800653
(1 row)
One of the customers is quite large (8.3% of the records):
buns=# select count(*) from cdr where cust_id = 99201110;
count
--------
231889
(1 row)
I have indexes on cust_id (integer) and bill_id (integer). If I try to do a query on that customer it doesn't use the index:
buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id is null;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=87082.81..87082.81 rows=1 width=0) (actual time=82279.63..82279.63 rows=1 loops=1)
-> Seq Scan on cdr (cost=0.00..87037.71 rows=18041 width=0) (actual time=82279.61..82279.61 rows=0 loops=1)
Filter: ((cust_id = 99201110) AND (bill_id IS NULL))
Total runtime: 82280.19 msec
(4 rows)
I tried this:
alter table cdr alter column cust_id set statistics 1000;
alter table cdr alter column bill_id set statistics 1000;
analyze verbose;
The I ran the query again but I still got the same result.
Then I tried disabling sequential scans in postgresql.conf, restarted the postmaster and did the query again:
buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id is null;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=913498.60..913498.60 rows=1 width=0) (actual time=48387.91..48387.91 rows=1 loops=1)
-> Index Scan using cdr_ix3 on cdr (cost=0.00..913453.49 rows=18041 width=0) (actual time=48387.89..48387.89 rows=0 loops=1)
Index Cond: (cust_id = 99201110)
Filter: (bill_id IS NULL)
Total runtime: 48388.47 msec
(5 rows)
The computed cost of using the index was a factor of 10 higher which I presume is why the query planner wasn't using the index, but it ran in half the time. So I guess I need to know how to alter the statistics collection so that the index will get used. I gather that index columns that occur in more than "a few" percent of the table cause the query planner to not use the index. Does that mean I won't be able to get the query planner to ever use the cust_id index for that customer or can I tune some parameters to alter that?
Any suggestions appreciated.
Thanks,
David
David Witham
Telephony Platforms Architect
Unidial, Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-02-18 05:10:05 | Re: Indexes and statistics |
Previous Message | Tom Lane | 2004-02-18 03:53:17 | Re: CHAR(n) always trims trailing spaces in 7.4 |