Re: single index on more than two coulumns a bad thing?

From: "Aaron Werman" <awerman(at)hotmail(dot)com>
To: "Leeuw van der, Tim" <tim(dot)leeuwvander(at)nl(dot)unisys(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: single index on more than two coulumns a bad thing?
Date: 2004-04-05 02:09:21
Message-ID: BAY9-DAV30oQREMe8bO000243eb@hotmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You're absolutely correct that the general rule is to lead a composite index
with the highest cardinality index columns for fastest selectivity. Indices
and all physical design are based on usage. In this case of unique indices
supporting primary keys in a hierarchy, it depends. For selection of small
sets of arbitrary rows, your arrangement is best. For hierarchy based
queries, such as "for grandparent of foo, and parent of bar, give average
age of sons" - the hierarchy based index is often more efficient.

Surrogate keys have a role, and can improve performance, but also carry an
enormous penalty of intentionally obfuscating logical keys and data
semantics, and almost always lead to data errors not being caught because
they obscure irrational relationships. I hate them, but use them frequently
in high transaction rate operational systems where there is much functional
validation outside the dbms (and the apps behave therefore like object
databases and surrogate keys are network database pointers) and in data
warehousing (where downstream data cannot be corrected anyway).

/Aaron

----- Original Message -----
From: "Leeuw van der, Tim" <tim(dot)leeuwvander(at)nl(dot)unisys(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Sent: Sunday, April 04, 2004 5:06 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?

Hi Aaron,

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of
> Aaron Werman
> Sent: vrijdag 2 april 2004 13:57
>
>
> another thing that I have all over the place is a hierarchy:
> index on grandfather_table(grandfather)
> index on father_table(grandfather, father)
> index on son_table(grandfather, father, son)
>

It depends on your data-distribution, but I find that in almost all cases
it's beneficial to have your indexes the other way round in such cases:

index on grandfather_table(grandfather)
index on father_table(father, grandfather)
index on son_table(son, father, grandfather)

That usually gives a less common, more selective value at the start of the
index, making the initial selection in the index smaller.

And AFAIK I don't have to rewrite my queries for that; the planner doesn't
care about the order of expressions in the query that are on the same level.

That said, I tend to use 'surrogate keys'; keys generated from sequences or
auto-number columns for my tables. It makes the tables less readable, but
the indexes remain smaller.

Greetings,

--Tim

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Heiko Kehlenbrink 2004-04-05 15:31:39 performance comparission postgresql/ms-sql server
Previous Message Leeuw van der, Tim 2004-04-04 21:06:11 Re: single index on more than two coulumns a bad thing?