From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Jay Levitt <jay(dot)levitt(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Stas Kelvich <stanconn(at)gmail(dot)com> |
Subject: | Re: Cube extension improvement, GSoC |
Date: | 2013-03-30 11:55:33 |
Message-ID: | CAPpHfdtsKPx7Kxy8eAAdD=H1V5Eh5H+TEz5sxUvF=twURgBAnw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Jey!
I remember I've started couple of threads related to cube extension:
http://www.postgresql.org/message-id/4F30616D.3030509@gmail.com
http://www.postgresql.org/message-id/4F3C16E9.90808@gmail.com
Could you provide some feedback to GSoC proposal of Stas?
On Sat, Mar 23, 2013 at 3:10 AM, Stas Kelvich <stanconn(at)gmail(dot)com> wrote:
> Hello,
>
> some time ago I started working on the data search system (about 100-200M
> of records) with queries consisted of several diapason and equality
> conditions, e.g.:
>
> WHERE dim1 BETWEEN 128 AND 137 AND
> WHERE dim2 BETWEEN 4815 AND 162342 AND
> WHERE dim3 = 42
> ORDER BY dim1 ASC
>
> There are 6 or 7 search criteria in my task. In order to avoid full table
> scan I started using R-Tree over cube data type:
>
> CREATE INDEX ON my_table USING GiST(cube(array[dim1, dim2, dim3]))
>
> For fast sorting I used Alexander Korotkov's patch for knngist (
> http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg153676.html)
> which changes metric for nearest neighbors search and allows to obtain
> cubes ordered by a specific coordinate. Having changed some data types and
> function/operator definitions I ported this to 9.2 (
> https://github.com/kelvich/postgres/commit/bb372) Then, after this I
> could select ordered records right from the index:
>
> SELECT * FROM my_table
> WHERE cube(array[dim1, dim2, dim3]) <@ cube
> '(128,4815,42),(137,162342,42)'
> ORDER BY cube(array[dim1, dim2, dim3]) <*> 5;
>
> The main issue of such approach is the index size. In my case it was about
> 100GB for 100M of records. Therefore index building becomes very expensive
> disk-related operation. For the same reason reading a large number of
> records from the index is slow too.
>
> I came to Oleg Bartunov, Theodor Sigaev and after a while to Alexander
> Korotkov for any help. (I'm very thankful to them and glad that they agreed
> to meet, listen to me and give useful advices). Having discussed it we
> decided that there was several possible improvements for the cube extension:
>
> * Adding point data type support to the cube extension in order to avoid
> storing of coincident upper left and lower right vertices, which may reduce
> the volume that leaf nodes occupy almost twice.
> * Checking the split algorithm with big datasets and, if possible,
> improving it.
> * Learning cube extension to store dimensions with different data types.
> Such index would be good alternative to compound key B-Tree multi-index
> (suitable for diapason queries and data ordering).
> * Providing support for KNN with metrics induced by the different norms:
> euclidean, taxicab norm, p-norm. This can be useful in fields where we can
> extract signature: similar images search, similar audio search.
>
> I'd like to participate in GSoC with this improvements, and I'm very
> interested in any comments or suggestions about this feature list.
>
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | 赖文豫 | 2013-03-30 14:08:44 | By now, why PostgreSQL 9.2 don't support SSDs? |
Previous Message | Alexander Korotkov | 2013-03-30 08:29:52 | Re: Getting to 9.3 beta |