From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) |
Date: | 2012-01-24 12:07:15 |
Message-ID: | CAPpHfduv5bwwXgdErZvneX4bT0hDb6MLznk63+Ht9vGKBE2n1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
New version of patch is attached.
On Thu, Dec 22, 2011 at 11:46 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> A few comments:
>
> * In range_gist_picksplit, it would be nice to have a little bit more
> intuitive description of what's going on with the nonEmptyCount and
> nonInfCount numbers. For instance, it appears to depend on the fact that
> a range must either be in nonEmptyCount or in nonInfCount. Also, can you
> describe the reason you're multiplying by two and taking the absolute
> value? It seems to work, but I am missing the intuition behind those
> operations.
>
total_count - 2*nonEmptyCount = (total_count - nonEmptyCount) -
nonEmptyCount = emptyCount - nonEmptyCount
So, it's really not evident. I've simplified it.
> * The penalty function is fairly hard to read still. At a high level, I
> think we're trying to accomplish a few things (in order from most to
> least important):
> - Keep normal ranges separate.
> - Avoid broadening the class of the original predicate (e.g. turning
> single-infinite into double-infinite).
> - Avoid broadening (as determined by subtype_diff) the original
> predicate.
> - Favor adding ranges to narrower original predicates.
>
> Do you agree? If so, perhaps we can attack those more directly and it
> might be a little more readable.
>
> Additionally, the arbitrary numbers might become a problem. Can we
> choose better constants there? They would still be arbitrary when
> compared with real numbers derived from subtype_diff, but maybe we can
> still do better than what's there.
>
I've changes some comments and add constants for penalty values.
> * Regarding the leftover "common" entries that can go to either side:
> what is the "delta" measure trying to accomplish? When I work through
> some examples, it seems to favor putting larger common ranges on the
> left (small delta) and smaller common ranges on the right (smaller
> delta). Why is that good? Or did I misread the code?
>
> Intuitively, I would think that we'd want to push the ranges with lower
> upper bounds to the left and higher lower bounds to the right -- in
> other words, recurse. Obviously, we'd need to make sure it terminated at
> some point, but splitting the common entries does seem like a smaller
> version of the original problem. Thoughts?
>
That was a bug. Actually, no "abs" is needed. Indeed it doesn't affect
result significantly.
-----
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
rangetypegist-0.6.patch.gz | application/x-gzip | 10.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-01-24 12:38:04 | Re: Client Messages |
Previous Message | Simon Riggs | 2012-01-24 11:22:16 | Re: Online base backup from the hot-standby |