From: | Emi Lu <emilu(at)encs(dot)concordia(dot)ca> |
---|---|
To: | Michael Fuhr <mike(at)fuhr(dot)org> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: About when we should setup index? |
Date: | 2006-03-07 14:54:11 |
Message-ID: | 440D9E93.9010007@encs.concordia.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thank you very much Michael. Your inputs are very helpful for me. Just
have one small question, the example you gave is based on postgresql
8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu,
compiled by GCC gcc 3.3.2)?
- Emi
>>. id is the primary key, so a default unique index is generated
>>automatically ?
>>
>>
>
>Yes. When you issue the CREATE TABLE statement you should see a
>notice like the following:
>
>CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
>
>You can see the index if you look at the table's description, such
>as when doing "\d test" in psql.
>
>
>
>>. if queries based on name are often, index should be setup for name?
>> if there are 30,000 records, 29,000 records' names are different, will
>>the index for name still be useful?
>>
>>
>
>If you have 29,000 unique names out of 30,000 rows then an index
>should definitely speed up queries by name.
>
>
>
>>. possible values for sex are F/M and null, should we setup index for sex?
>>
>>
>
>Probably not, although 8.1 can make better use of indexes on
>low-cardinality columns than previous versions could. If you're
>using 8.1 then try running typical queries with and without such
>an index to see if it makes much difference. EXPLAIN ANALYZE will
>show whether the index is being used. Unless you see a significant
>improvement in query performance then don't bother indexing this
>column.
>
>
>
>>. How about index for date and timestamp?
>>
>>
>
>Probably, if you regularly query on those columns.
>
>
>
>>Basically, I'd like to know is there a percentage of the differences
>>among data to decide whether index will help or not? For example, among
>>30,000 records, for a column, its value choices are less than A% and
>>greater than B% so that we know index will help a lot?
>>
>>
>
>There's no absolute rule; among other things physical order on disk
>influences the planner's decision to use an index. If a table is
>clustered on a particular index then the planner might use that
>index to fetch 80% of the table's rows, but if the data is randomly
>scattered then the planner might prefer a sequential scan to fetch
>only 3% of the rows.
>
>If you regularly query a column that has more than a handful of
>distinct values then queries will probably benefit from an index
>on that column; in 8.1 even queries against low-cardinality columns
>might benefit from an index. If you perform a lot of queries for
>values that are close together, or if queries for a certain value
>tend to return a lot of rows, then consider clustering the table
>on that column's index (and be sure to run ANALYZE afterwards to
>update the table's statistics). Indexes have costs, however: they
>take up disk space and they can slow down other operations like
>inserts and updates because each index has to be updated as well.
>Create whatever indexes you need to realize a significant improvement
>in query performance, but don't overdo it.
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-03-07 14:57:30 | Re: pg_dump error - filesystem full |
Previous Message | Tom Lane | 2006-03-07 14:48:02 | Re: real - integer type cast in prepared statements |