From: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date: | 2024-07-02 12:52:58 |
Message-ID: | CAJ7c6TMx43RT7qDHWtqtfXtmd0iZOWyUOd8M2Ch-gvXNCOpXGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Peter,
> Attached is a POC patch that adds skip scan to nbtree. The patch
> teaches nbtree index scans to efficiently use a composite index on
> '(a, b)' for queries with a predicate such as "WHERE b = 5". This is
> feasible in cases where the total number of distinct values in the
> column 'a' is reasonably small (think tens or hundreds, perhaps even
> thousands for very large composite indexes).
>
> [...]
>
> Thoughts?
Many thanks for working on this. I believe it is an important feature
and it would be great to deliver it during the PG18 cycle.
I experimented with the patch and here are the results I got so far.
Firstly, it was compiled on Intel MacOS and ARM Linux. All the tests
pass just fine.
Secondly, I tested the patch manually using a release build on my
Raspberry Pi 5 and the GUCs that can be seen in [1].
Test 1 - simple one.
```
CREATE TABLE test1(c char, n bigint);
CREATE INDEX test1_idx ON test1 USING btree(c,n);
INSERT INTO test1
SELECT chr(ascii('a') + random(0,2)) AS c,
random(0, 1_000_000_000) AS n
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test1 WHERE n > 900_000_000;
```
Test 2 - a more complicated one.
```
CREATE TABLE test2(c1 char, c2 char, n bigint);
CREATE INDEX test2_idx ON test2 USING btree(c1,c2,n);
INSERT INTO test2
SELECT chr(ascii('a') + random(0,2)) AS c1,
chr(ascii('a') + random(0,2)) AS c2,
random(0, 1_000_000_000) AS n
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test2 WHERE n > 900_000_000;
```
Test 3 - to see how it works with covering indexes.
```
CREATE TABLE test3(c char, n bigint, s text DEFAULT 'text_value' || n);
CREATE INDEX test3_idx ON test3 USING btree(c,n) INCLUDE(s);
INSERT INTO test3
SELECT chr(ascii('a') + random(0,2)) AS c,
random(0, 1_000_000_000) AS n,
'text_value_' || random(0, 1_000_000_000) AS s
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT s FROM test3 WHERE n < 1000;
```
In all the cases the patch worked as expected.
I noticed that with the patch we choose Index Only Scans for Test 1
and without the patch - Parallel Seq Scan. However the Parallel Seq
Scan is 2.4 times faster. Before the patch the query takes 53 ms,
after the patch - 127 ms. I realize this could be just something
specific to my hardware and/or amount of data.
Do you think this is something that was expected or something worth
investigating further?
I haven't looked at the code yet.
[1]: https://github.com/afiskon/pgscripts/blob/master/single-install-meson.sh
--
Best regards,
Aleksander Alekseev
From | Date | Subject | |
---|---|---|---|
Next Message | Dagfinn Ilmari Mannsåker | 2024-07-02 12:55:25 | Re: Cleaning up perl code |
Previous Message | Amit Langote | 2024-07-02 12:38:41 | Re: Doc Rework: Section 9.16.13 SQL/JSON Query Functions |