Quick Links

Re: What kind of index to use for many rows with few unique values?

From:	Joel Burton <joel(at)joelburton(dot)com>
To:	"David F(dot) Skoll" <dfs(at)roaringpenguin(dot)com>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: What kind of index to use for many rows with few unique values?
Date:	2002-12-02 23:04:38
Message-ID:	20021202230438.GA16554@temp.joelburton.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

On Mon, Dec 02, 2002 at 05:10:00PM -0500, David F. Skoll wrote:
> Hi,
>
> I have a table with a column called "state". Each row can be in one
> of four states, let's call them 'new', 'pending', 'ok', and 'bad'.
> On average, about 95% of the rows will be 'bad', with the remaining
> 5% being in one of the other three states. If the table has 50K rows
> and I just want to pull out the 'ok' rows, I don't want to do a sequential
> scan. To pull out the 'bad' rows, obviously, sequential scan is fine.
>
> I've heard that a btree index performs badly in this situation. Is
> a hash index appropriate? I've heard bad things about hash indexes in
> PostgreSQL.

create table states (id serial primary key, state varchar(10), t text );

create function fill_states(varchar, int) returns bool as '
begin for i in 1 .. $2
loop
insert into states (state, t) values ($1, ''random'');
end loop;
return true;
end;

' language plpgsql;

select fill_states('ok',45000);
select fill_states('bad',5000);
select fill_states('warning',5000);

analyze states;

joel(at)joel=# explain select * from states where state='warning';
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using state_idx on states (cost=0.00..1013.00 rows=5533 width=20)
Index Cond: (state = 'warning'::character varying)
(2 rows)

joel(at)joel=# explain select * from states where state='ok';
QUERY PLAN
--------------------------------------------------------------
Seq Scan on states (cost=0.00..1171.51 rows=44627 width=20)
Filter: (state = 'ok'::character varying)
(2 rows)

Looks right to me: index scan for the less-common option, seqscan for
the most common. Why don't you think this, as a btree, will work for
you?

Joel BURTON | joel(at)joelburton(dot)com | joelburton.com | aim: wjoelburton
Independent Knowledge Management Consultant

In response to

What kind of index to use for many rows with few unique values? at 2002-12-02 22:10:00 from David F. Skoll

Responses

Re: What kind of index to use for many rows with few unique at 2002-12-02 23:14:49 from David F. Skoll

Browse pgsql-admin by date

	From	Date	Subject
Next Message	David F. Skoll	2002-12-02 23:14:49	Re: What kind of index to use for many rows with few unique
Previous Message	David F. Skoll	2002-12-02 22:10:00	What kind of index to use for many rows with few unique values?