From: | Neil Conway <neilc(at)samurai(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)stack(dot)net> |
Subject: | GiST: PickSplit and multi-attr indexes |
Date: | 2004-11-14 09:21:43 |
Message-ID: | 419723A7.5040505@samurai.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Oleg & Teodor,
If I understand the code correctly, GiST will only pass the first
attribute of each index tuple to the user-defined PickSplit method when
it wants to split a node. (see circa line 1269 of gist.c)
Is this a wise design decision? Granted, in many situations the first
attribute in the index is sufficient to make a reasonable decision about
how to divide the node into two halves, but I don't think that is
universally true. For example, consider a two column index whose first
attribute has a small number of distinct values. It could well be that
all the first attribute values in a node-to-be-split would be the same.
Only passing the first attribute to PickSplit would result in an
essentially random distribution of tuples among the split nodes, rather
than allowing the GiST extension to make use of the second attribution
to partition the nodes. That's an extreme example, but it is easy to
construct more realistic scenarios (basically, any situation in which
the cardinality of the first index attribute is low -- a reasonably
common occurrence with a multi-column index, I believe).
I'm not sure whether this would be a problem in practice. Speculation:
repeated invocations of PickSplit are one of the main factors in
deriving the ultimate shape of the GiST tree. Poor distribution of keys
resulting from PickSplit would eventually result in unnecessarily loose
bounding predicates in internal nodes, which would hurt performance.
Comments welcome,
Neil
From | Date | Subject | |
---|---|---|---|
Next Message | Joachim Wieland | 2004-11-14 09:33:01 | postmaster segfaults with HUGE table |
Previous Message | Thomas Hallgren | 2004-11-14 08:47:11 | Re: CVS should die |