Re: Questions about indexes?

From: Ryan Bradetich <rbradetich(at)uswest(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Questions about indexes?
Date: 2003-02-17 07:05:31
Message-ID: 1045465530.30629.26.camel@beavis.ybsoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2003-02-16 at 23:34, Tom Lane wrote:
> Ryan Bradetich <rbradetich(at)uswest(dot)net> writes:
> > Although the table schema is immaterial, I will provide it so we have a
> > common framework for this discussion:
>
> > host_id integer (not null)
> > timestamp datetime (not null)
> > category text (not null) [<= 5 chars]
> > anomaly text (not null) [<= 1024 chars]
>
> > This table is used to store archived data, so each row in the table must
> > be unique. Currently I am using a primary key across each column to
> > enforce this uniqueness.
>
> It's not real clear to me why you bother enforcing a constraint that the
> complete row be unique. Wouldn't a useful constraint be that the first
> three columns be unique? Even if that's not correct, what's wrong with
> tolerating a few duplicates? You can't tell me it's to save on storage
> ;-)

The table holds system policy compliance data. The catagory is
basically the policy, and the anomaly is the detailed text explaining
why the system is out of compliance. So the anomaly data is important
(and often the reason why the key is unique). The reason we are
archiving the data is to generate reports and graphs showing policy
compliance over time. Duplicated rows will artifically inflate the
numbers in the reports and graphs. The other option we had was to
perform a DISTINCT select at report / graph time, we chose no to go this
route bacause of the sort added to the query. (Also it just seemed
tidier to only store good data :))

The disk storage is a minor concern :), but I was actually looking at it
as a possible performance enhancement. I am curious how it affects the
shared buffer cache, and also there should be less average pages to read
since the index size was smaller.

Does this make sense? Or am I out in left field again? :)

> > I am not sure why all the data is duplicated in the index ... but i bet
> > it has to do with performance since it would save a lookup in the main
> > table.
>
> An index that can't prevent looking into the main table wouldn't be
> worth anything AFAICS ...

Ok, scratch that idea then :) I will continue looking at other ideas
like the MD5 data hashing etc.

Thanks for your input Tom!

- Ryan

regards, tom lane
--
Ryan Bradetich <rbradetich(at)uswest(dot)net>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-02-17 07:15:27 Re: Questions about indexes?
Previous Message Tom Lane 2003-02-17 06:34:50 Re: Questions about indexes?