Quick Links

Re: indexing for distinct search in timestamp based table

From:	"Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: indexing for distinct search in timestamp based table
Date:	2008-09-05 15:10:35
Message-ID:	1d709ecc0809050810m42bfbedfk98ac7d83f13fb7@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

You might get great improvement for '%' cases using index on
channel_name(<field>,
start_time) and a little bit of pl/pgsql

Basically, you need to implement the following algorithm:
1) curr_<field> = ( select min(<field>) from ad_log )
2) record_exists = ( select 1 from ad_log where <field>=cur_<field> and
_all_other_conditions limit 1 )
3) if record_exists==1 then add curr_<field> to the results
3) curr_<field> = (select min(<field>) from ad_log where <field> >
curr_<field> )
4) if curr_<field> is not null then goto 2

I believe it might make sense implement this approach in the core (I would
call it "index distinct scan")

That could dramatically improve "select distinct <column> from <table>" and
"select <column> from <table> group by <column>" kind of queries when there
exists an index on <column> and a particular column has very small number of
distinct values.

For instance: say a table has 10'000'000 rows, while column of interest has
only 20 distinct values. In that case, the database will be able to get
every of those 20 values in virtually 20 index lookups.

What does the community think about that?

In response to

indexing for distinct search in timestamp based table at 2008-08-28 08:06:23 from Rainer Mager

Responses

Re: indexing for distinct search in timestamp based table at 2008-09-08 03:54:04 from Rainer Mager

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alvaro Herrera	2008-09-05 16:24:46	Re: too many clog files
Previous Message	Nikolas Everett	2008-09-05 14:24:00	Re: SAN and full_page_writes