From: | "Michael Richards" <michael(at)fastmail(dot)ca> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org |
Subject: | Index selection on a large table |
Date: | 2000-07-25 23:03:38 |
Message-ID: | 397E1CCA.000017.47769@frodo.searchcanada.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-sql |
Hi.
I believe this to be a bug, but I am submitting it to the SQL list as
well in the case I overlooked something.
I'm running
Postgres 7.0.2 on FreeBSD 4.0-STABLE/Intel
compiled with [gcc version 2.95.2 19991024 (release)]
I've got this interesting problem where the query plan is not what I
expect. Every month we log details of users logging into their
webmail accounts. Every week before the database is vacuumed the
oldest week's entries are removed. The table can be quite large. Here
is the relevant parts of its structure:
Table "logins"
Attribute | Type | Modifier
-----------+-------------+----------
loginid | varchar(16) |
ip | varchar(15) |
[...]
Indices: logins_ip_idx,
logins_loginid_idx
The indexes are as follows:
Index "logins_ip_idx"
Attribute | Type
-----------+-------------
ip | varchar(15)
btree
Index "logins_loginid_idx"
Attribute | Type
-----------+-------------
loginid | varchar(16)
btree
Size of the table:
fastmail=> select count(*) from logins;
count
---------
1082564
(1 row)
Now here is a query plan from a selection using loginid:
explain select * from logins where loginid='michael';
NOTICE: QUERY PLAN:
Index Scan using logins_loginid_idx on logins (cost=0.00..500.57
rows=130 width=48)
As expected it uses the logins_loginid_idx to select the rows that
match loginid='michael';
Now I should note that I store the IP's as type varchar(15).
The following query yeilds the questionable query plan:
explain select * from logins where ip='38.30.141.44';
NOTICE: QUERY PLAN:
Seq Scan on logins (cost=0.00..25248.51 rows=11602 width=48)
This one decides to ignore the fact that IP is indexed (and the table
was vacuumed) and it does a slow-assed sequential scan through a
million or so rows to pick out (in this case 3 matching rows).
Just to be sure, I re-vacuumed the table and tried the questionable
query again with the same results. Here is the vacuum output in case
it helps:
NOTICE: --Relation logins--
NOTICE: Pages 11717: Changed 1, reaped 0, Empty 0, New 0; Tup
1082580: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen
92; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
2.28s/15.38u sec.
NOTICE: Index logins_ip_idx: Pages 4550; Tuples 1082580. CPU
0.78s/2.65u sec.
NOTICE: Index logins_loginid_idx: Pages 3881; Tuples 1082580. CPU
0.62s/2.67u sec.
VACUUM
-Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-07-25 23:30:39 | Re: [SQL] Index selection on a large table |
Previous Message | Tom Lane | 2000-07-24 03:21:51 | Re: Converting a timestamp to a date when it contains nulls. |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-07-25 23:05:27 | Re: pg_dump |
Previous Message | sathya priya | 2000-07-25 22:14:08 | pg_dump |