From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) |
Date: | 2011-11-27 18:43:37 |
Message-ID: | 4870.1322419417@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> There's been some significant change in rangetypes_gist.c, can you
>> please rebase this patch?
> OK, rebased with head.
I looked at this patch a bit. I agree with the aspect of it that says
"let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges"; I think we really need that in order to make
contained_by searches usable. However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions. I see two problems
there:
1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff. This is not good, because you have
no idea what scale the subtype differences will be expressed on. The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.
2. It's too large/complicated. You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size. I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.
There are a lot of garden-variety coding problems, too, for instance here:
+ *penalty = Max(DatumGetFloat8(FunctionCall2(
+ subtype_diff, orig_lower.val, new_lower.val)), 0.0);
which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument. But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.
Earlier in the thread you wrote:
> Questions:
> 1) I'm not sure about whether we need support of <> in GiST, because it
> always produces full index scan (except search for non-empty ranges).
I was thinking the same thing; that opclass entry seems pretty darn
useless.
I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly". The redesign of the penalty and picksplit
functions should be discussed separately.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2011-11-27 18:47:21 | information schema/aclexplode doesn't know about default privileges |
Previous Message | Alexander Soudakov | 2011-11-27 18:29:45 | Re: Feature proposal: www_fdw |