From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | amborodin(at)acm(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Subject: | Re: GiST penalty functions [PoC] |
Date: | 2016-09-06 16:57:28 |
Message-ID: | 236d7d60-4415-f4c9-34d7-656a8314d1bd@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08/29/2016 09:04 AM, Andrew Borodin wrote:
> In this message I address only first problem. I want to construct a
> penalty function that will choose:
> 1. Entry with a zero volume and smallest margin, that can
> accommodate insertion tuple without extension, if one exists;
> 2. Entry with smallest volume, that can accommodate insertion tuple
> without extension, if one exists;
> 3. Entry with zero volume increase after insert and smallest margin
> increase, if one exists;
> 4. Entry with minimal volume increase after insert.
Looking at the code, by "margin", you mean the sum of all "edges", i.e.
of each dimension, of the cube. I guess the point of that is to
differentiate between cubes that have the same volume, but a different
shape.
So, let's try to come up with a scheme that doesn't require IEEE 754
floats. Those above cases can be split into two categories, depending on
whether the new value has zero volume or not. We can use a completely
different scheme for the two cases, if we want, because when we're
choosing a target page to insert to, penalty gets called for every
original tuple, but with the same new tuple.
Here's a scheme I just came up with. There might be better ones, but
it's something.
Let's have:
oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase
oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase
Case A: New entry has zero volume.
------
Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.
So how do we cram those two into a single float?
If we offset case A1 by n, we can use the range between [0, nE] for A2.
Something like this pseudocode:
if (dE > 0)
return nE + dE; /* A1, offset dE into range [nE, inf] */
else
return nE - (nE/oE); /* A2, scale oE into range [0, nE] */
Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.
By offsetting cases B1 and B2, we can again divide the range into three
parts, with pseudocode like:
if (dV > 0)
return nV + nE + dV; /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)
return nV + dE; /* B2, offset dE into range [nV, nV+nE] */
else
return nV - (nV/oV) /* B3, scale oV into range [0, nV]
> I’ve tested patch performance with attached test. On my machine patch
> slows index construction time from 60 to 76 seconds, reduces size of
> the index from 85Mb to 82Mb, reduces time of aggregates computation
> from 36 seconds to 29 seconds.
Hmm. That's a pretty large increase in construction time. Have you done
any profiling on where the time goes?
> I do not know: should I continue this work in cube, or it’d be better
> to fork cube?
Should definitely continue work within cube, IMHO. I don't think forking
it to a new datatype would make this any easier.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2016-09-06 16:58:25 | Re: Vacuum: allow usage of more than 1GB of work mem |
Previous Message | Anastasia Lubennikova | 2016-09-06 16:48:31 | Re: WIP: Covering + unique indexes. |