Re: What kind of index to use for many rows with few unique values?

From: Joel Burton <joel(at)joelburton(dot)com>
To: "David F(dot) Skoll" <dfs(at)roaringpenguin(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: What kind of index to use for many rows with few unique values?
Date: 2002-12-02 23:04:38
Message-ID: 20021202230438.GA16554@temp.joelburton.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, Dec 02, 2002 at 05:10:00PM -0500, David F. Skoll wrote:
> Hi,
>
> I have a table with a column called "state". Each row can be in one
> of four states, let's call them 'new', 'pending', 'ok', and 'bad'.
> On average, about 95% of the rows will be 'bad', with the remaining
> 5% being in one of the other three states. If the table has 50K rows
> and I just want to pull out the 'ok' rows, I don't want to do a sequential
> scan. To pull out the 'bad' rows, obviously, sequential scan is fine.
>
> I've heard that a btree index performs badly in this situation. Is
> a hash index appropriate? I've heard bad things about hash indexes in
> PostgreSQL.

create table states (id serial primary key, state varchar(10), t text );

create function fill_states(varchar, int) returns bool as '
begin for i in 1 .. $2
loop
insert into states (state, t) values ($1, ''random'');
end loop;
return true;
end;

' language plpgsql;

select fill_states('ok',45000);
select fill_states('bad',5000);
select fill_states('warning',5000);

analyze states;

joel(at)joel=# explain select * from states where state='warning';
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using state_idx on states (cost=0.00..1013.00 rows=5533 width=20)
Index Cond: (state = 'warning'::character varying)
(2 rows)


joel(at)joel=# explain select * from states where state='ok';
QUERY PLAN
--------------------------------------------------------------
Seq Scan on states (cost=0.00..1171.51 rows=44627 width=20)
Filter: (state = 'ok'::character varying)
(2 rows)

Looks right to me: index scan for the less-common option, seqscan for
the most common. Why don't you think this, as a btree, will work for
you?

--

Joel BURTON | joel(at)joelburton(dot)com | joelburton.com | aim: wjoelburton
Independent Knowledge Management Consultant

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message David F. Skoll 2002-12-02 23:14:49 Re: What kind of index to use for many rows with few unique
Previous Message David F. Skoll 2002-12-02 22:10:00 What kind of index to use for many rows with few unique values?