From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Иван Парфилов <iparfilov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GSoC 2014 proposal |
Date: | 2014-04-02 10:22:20 |
Message-ID: | CAPpHfdv=N2YVM8xYp3O-Kco1u9e8CDupdrhmMn_5+nYhcDY_PA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com>wrote:
> The BIRCH algorithm as described in the paper describes building a tree in
> memory. If I understood correctly, you're suggesting to use a pre-built
> GiST index instead. Interesting idea!
>
> There are a couple of signifcant differences between the CF tree described
> in the paper and GiST:
>
> 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
> a leaf item represents a cluster, which consists of one or more tuples. So
> the CF tree doesn't store an entry for every input tuple, which makes it
> possible to keep it in memory.
>
> 2. In the CF tree, "all entries in a leaf node must satisfy a threshold
> requirement, with respect to a threshold value T: the diameter (or radius)
> has to be less than T". GiST imposes no such restrictions. An item can
> legally be placed anywhere in the tree; placing it badly will just lead to
> degraded search performance, but it's still a legal GiST tree.
>
> 3. A GiST index, like any other index in PostgreSQL, holds entries also
> for deleted tuples, until the index is vacuumed. So you cannot just use
> information from a non-leaf node and use it in the result, as the
> information summarized at a non-leaf level includes noise from the dead
> tuples.
>
> Can you elaborate how you are planning to use a GiST index to implement
> BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
> strict in where in the tree an item can be stored, and lets the operator
> class to specify exactly when a node is split etc.
>
Hmmm, it's likely I've imagined something quite outside of this paper, and
even already suggested it to Ivan... :)
I need a little time to rethink it.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2014-04-02 11:59:03 | Re: Including replication slot data in base backups |
Previous Message | Andres Freund | 2014-04-02 09:58:10 | Re: Including replication slot data in base backups |