From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Sezai YILMAZ <sezai(dot)yilmaz(at)pro-g(dot)com(dot)tr> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Non-unique index performance |
Date: | 2005-06-24 09:18:42 |
Message-ID: | 42BBCFF2.3000200@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sezai YILMAZ wrote:
> Hello!
>
> I have a table eith name person as described below. It has an unique
> index for id column (it is also primary key) and has an index for parent
> column.
>
> If I run a query with where clause on id column it uses the index (look
> at the first explain analyze result; it says "Index Scan using...") and
> the query for 582856 rows table results in 225,893 ms.
>
> But, if I run another query with where clause on parent column it does
> not use the index (look at the second explain analyze result; it says
> "Seq Scan using...") and the query for 582856 rows table results in
> 11192.913 ms.
>
> Why the difference of both queries is so dramatical for unique and
> non-unique indexed columns? Why PostgreSQL does not use the non-unique
> indexes (it says that it does sequential scan)?
Because it thinks it will be faster/cheaper.
> I have to use an index on non-unique column. What is the solution for
> that? Is there a way to speed up non-unique indexes?
You don't want to force it to use an index, you want it to make better
estimates. Let's have a look...
> *****************************************************************
> test=> \d person
> Table "public.person"
> Column | Type | Modifiers
> ---------+-----------------------+-----------
> name | character varying(30) |
> surname | character varying(30) |
> id | integer | not null
> parent | integer |
> Indexes:
> "person_pkey" primary key, btree (id)
> "parent_ndx" btree (parent)
OK - all very simple. And you've said there are about 580,000 rows.
> test=> explain analyze select id,name from person where id in ('17201',
> '338191', '244319', '515209', '20415');
Why are you quoting integers?
> test=> explain analyze select * from person where parent in ('17201',
> '338191', '244319', '515209', '20415');
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>
> Seq Scan on person (cost=0.00..35871.26 rows=14427 width=32) (actual
> time=0.063..11192.809 rows=5 loops=1)
> Filter: ((parent = 17201) OR (parent = 338191) OR (parent = 244319) OR
> (parent = 515209) OR (parent = 20415))
> Total runtime: 11192.913 ms
Hmm - for some reason it's expecting 14427 rows to be returned. If there
were that many matches, then it might well be a better choice than going
back and fore between the index and the table all the time.
So - we need to find out why it thinks there will be so many rows returned.
1. VACUUM FULL ANALYSE person;
2. re-run the explain
That will make sure that the table's statistics are up-to-date. If that
hasn't helped, then perhaps we need to educate PG about the distribution
of values in "parent".
2. ALTER TABLE person ALTER COLUMN parent SET STATISTICS 100;
3. ANALYSE person;
4. re-run the explain
If that still doesn't work, keep increasing the statistics (max. value
1000).
Let us know how that works for you.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2005-06-24 09:20:03 | Re: Win32 users? |
Previous Message | Sezai YILMAZ | 2005-06-24 08:44:50 | Non-unique index performance |