Re: [PATCH] kNN for btree

From: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, 'Alvaro Herrera' <alvherre(at)alvh(dot)no-ip(dot)org>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] kNN for btree
Date: 2024-04-02 12:30:02
Message-ID: 168464ba-690a-487f-a4db-8b243c494af8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Andrey!

On 31.03.2024 12:22, Andrey M. Borodin wrote:
>
>
>> On 15 Jan 2024, at 13:11, Anton A. Melnikov <a(dot)melnikov(at)postgrespro(dot)ru> wrote:
>>
>> If there are any ideas pro and contra would be glad to discuss them.
>
> Hi, Anton!
>
> This is kind of ancient thread. I've marked CF entry [0] as "Needs review" and you as an author (seems like you are going to be a point of correspondence on this feature).
>

That's right, i would like to bring the work on this feature to a positive result.

First of all, let me share a simple test that allows one to estimate the effect of applying this patch and,
i hope, can be considered as a criterion for future versions.

For all the tests below, one should set the following settings:
set enable_seqscan to false;
set enable_indexscan to true;
set enable_bitmapscan to false;
set enable_indexonlyscan to true;
set max_parallel_workers_per_gather = 0;

To carry out the test, one can use the table "events" mentioned in the first message of this thread, linked here [1].
psql -f events.dump
Then perform a query like that:
explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 100000;

When using the existing btree_gist extension and preliminary commands executing:
create extension btree_gist;

CREATE INDEX event_date_idx ON events USING gist (date);

the test query gives:

postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 100000;

QUERY PLAN

------------------------------------------------------------------------------------------------------

Limit (actual time=0.759..102.992 rows=100000 loops=1)

-> Index Only Scan using event_date_idx on events (actual time=0.757..97.021 rows=100000 loops=1)

Order By: (date <-> '1957-10-04'::date)

Heap Fetches: 0

Planning Time: 0.504 ms

Execution Time: 108.311 ms

Average value on my PC was 107+-1 ms.

When using an existing patch from [1] and creating a btree index:

CREATE INDEX event_date_idx ON events USING btree (date);

instead of btree_gist one, the test query gives:

postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 100000;

QUERY PLAN

------------------------------------------------------------------------------------------------------

Limit (actual time=0.120..48.817 rows=100000 loops=1)

-> Index Only Scan using event_date_idx on events (actual time=0.117..42.610 rows=100000 loops=1)

Order By: (date <-> '1957-10-04'::date)

Heap Fetches: 0

Planning Time: 0.487 ms

Execution Time: 54.463 ms

55+-1 ms on average.
The execution time is reduced by ~2 times. So the effect is obvious
but the implementation problems are reasonable too.

On 15.01.2024 11:11, Anton A. Melnikov wrote:
> On 16.03.2020 16:17, Alexander Korotkov wrote:
>> After another try to polish this patch I figured out that the way it's
>> implemented is unnatural. I see the two reasonable ways to implement
>> knn for B-tree, but current implementation matches none of them.
>>
>> 1) Implement knn as two simultaneous scans over B-tree: forward and
>> backward. It's similar to what current patchset does. But putting
>> this logic to B-tree seems artificial. What B-tree does here is still
>> unidirectional scan. On top of that we merge results of two
>> unidirectional scans. The appropriate place to do this high-level
>> work is IndexScan node or even Optimizer/Executor (Merge node over to
>> IndexScan nodes), but not B-tree itself.
>> 2) Implement arbitrary scans in B-tree using priority queue like GiST
>> and SP-GiST do. That would lead to much better support for KNN. We
>> can end up in supporting interesting cases like "ORDER BY col1 DESC,
>> col2 <> val1, col2 ASC" or something. But that's requires way more
>> hacking in B-tree core.
>

> At first i'm going to implement p.1). I think it's preferable for now
> because it seems easier and faster to get a working version.
>

I was wrong here. Firstly, this variant turned out to be not so easy and fast,
and secondly, when i received the desired query plan, i was not happy with the results:

In the case of btree_gist, splitting the query into two scans at the optimizer level
and adding MergeAppend on the top of it resulted in a ~13% slowdown in query execution.
The average time became ~121 ms.

Limit (actual time=1.205..117.689 rows=100000 loops=1)

-> Merge Append (actual time=1.202..112.260 rows=100000 loops=1)

Sort Key: ((events.date <-> '1957-10-04'::date))

-> Index Only Scan using event_date_idx on events (actual time=0.713..43.372 rows=42585 loops=1)

Index Cond: (date < '1957-10-04'::date)

Order By: (date <-> '1957-10-04'::date)

Heap Fetches: 0

-> Index Only Scan using event_date_idx on events events_1 (actual time=0.486..58.015 rows=57416 loops=1)

Index Cond: (date >= '1957-10-04'::date)

Order By: (date <-> '1957-10-04'::date)

Heap Fetches: 0

Planning Time: 1.212 ms

Execution Time: 120.517 ms

When using the btree index and the existing v18 patch, the slowdown from dividing the request
into two scans was less, ~3-4%, but i'm not sure about the correctness of the comparison in this case,
since the btree low level in the first variant proposed by Alexander
should work differently, like unpatched one.

Overall in terms of efficiency, the implementation of the first variant
turns out to be worse than the existing version of the patch.
IMO there is an additional argument pro the second variant proposed by Alexander.
The existing version of the patch does not support sorting in descending order.
Adding DESC to the test query gives:

postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date DESC LIMIT 100000;

QUERY PLAN

--------------------------------------------------------------------------------

Limit (actual time=113.455..133.790 rows=100000 loops=1)

-> Sort (actual time=113.453..128.446 rows=100000 loops=1)

Sort Key: ((date <-> '1957-10-04'::date)) DESC

Sort Method: external merge Disk: 2680kB

-> Seq Scan on events (actual time=0.032..43.613 rows=151643 loops=1)

Planning Time: 0.514 ms

Execution Time: 142.278 ms

IndexOnlyScan disappears from the plan, and the query execution time increases by ~2.5 times.

For that regard, the existing implementation in btree_gist behaves more adequately:

postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date DESC LIMIT 100000;

QUERY PLAN

------------------------------------------------------------------------------------------------------------

Limit (actual time=144.409..163.660 rows=100000 loops=1)

-> Sort (actual time=144.406..158.267 rows=100000 loops=1)

Sort Key: ((date <-> '1957-10-04'::date)) DESC

Sort Method: external merge Disk: 2680kB

-> Index Only Scan using event_date_idx on events (actual time=0.553..81.035 rows=151643 loops=1)

Heap Fetches: 0

Planning Time: 0.525 ms

Execution Time: 172.201 ms

IndexOnlyScan remains in the request, the query execution time increases by ~1.5 times in comparison with ASC order.

It seems that it would be better if the both sorting directions won't give essentially different plans
and not differ greatly from each other in execution time.

On 31.03.2024 12:22, Andrey M. Borodin wrote:
>
> At this point it's obvious that the feature won't make it to 17, so let's move to July CF.
>

Of course. IMHO, this is the most suitable solution at the moment.
Thank you!

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[1] https://github.com/antamel/postgres/raw/test-base-events/test-rel/events.dump.gz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-04-02 12:31:40 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Alexander Korotkov 2024-04-02 12:25:49 Re: [HACKERS] make async slave to wait for lsn to be replayed