Re: CUBE_MAX_DIM

From: Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Devrim Gündüz <devrim(at)gunduz(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CUBE_MAX_DIM
Date: 2020-06-25 20:47:30
Message-ID: PR1PR02MB534067DDB48CCDC51456CB69E3920@PR1PR02MB5340.eurprd02.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Sent: 25 June 2020 17:43
>
> Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> writes:
> > I know that Cube in it's current form isn't suitable for nearest-neighbour searching these vectors in their raw form (I have tried recompilation with higher CUBE_MAX_DIM myself), but conceptually kNN GiST searches using Cubes can be useful for these applications. There are other pre-processing techniques that can be used to improved the speed of the search, but it still ends up with a kNN search in a high-ish dimensional space.
>
> Is there a way to fix the numerical instability involved? If we could do
> that, then we'd definitely have a use-case justifying the work to make
> cube toastable.

I am not that familiar with the nature of the numerical instability, but it might be worth noting for additional context that for the NN use case:

- The value of each dimension is likely to be between 0 and 1
- The L1 distance is meaningful for high numbers of dimensions, which *possibly* suffers less from the numeric issues than euclidean distance.

The numerical stability isn't the only issue for high dimensional kNN, the GiST search performance currently degrades with increasing N towards sequential scan performance, although maybe they are related?

> regards, tom lane

Best regards,
Alastair

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-06-25 21:28:02 Re: Default setting for enable_hashagg_disk
Previous Message Andres Freund 2020-06-25 20:36:29 Re: Default setting for enable_hashagg_disk