From: | Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Devrim Gündüz <devrim(at)gunduz(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: CUBE_MAX_DIM |
Date: | 2020-06-25 20:47:30 |
Message-ID: | PR1PR02MB534067DDB48CCDC51456CB69E3920@PR1PR02MB5340.eurprd02.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Sent: 25 June 2020 17:43
>
> Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> writes:
> > I know that Cube in it's current form isn't suitable for nearest-neighbour searching these vectors in their raw form (I have tried recompilation with higher CUBE_MAX_DIM myself), but conceptually kNN GiST searches using Cubes can be useful for these applications. There are other pre-processing techniques that can be used to improved the speed of the search, but it still ends up with a kNN search in a high-ish dimensional space.
>
> Is there a way to fix the numerical instability involved? If we could do
> that, then we'd definitely have a use-case justifying the work to make
> cube toastable.
I am not that familiar with the nature of the numerical instability, but it might be worth noting for additional context that for the NN use case:
- The value of each dimension is likely to be between 0 and 1
- The L1 distance is meaningful for high numbers of dimensions, which *possibly* suffers less from the numeric issues than euclidean distance.
The numerical stability isn't the only issue for high dimensional kNN, the GiST search performance currently degrades with increasing N towards sequential scan performance, although maybe they are related?
> regards, tom lane
Best regards,
Alastair
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2020-06-25 21:28:02 | Re: Default setting for enable_hashagg_disk |
Previous Message | Andres Freund | 2020-06-25 20:36:29 | Re: Default setting for enable_hashagg_disk |