From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | [WIP PATCH] Index scan offset optimisation using visibility map |
Date: | 2018-01-31 22:17:35 |
Message-ID: | CANtu0oi3a1Rf1PVsBufQbm+g9ytSv75+kp7kJYvK5C1qF_0Siw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
WIP-Patch for optimisation of OFFSET + IndexScan using visibility map.
Patch based on idea of Maxim Boguk [1] with some inspiration from Douglas
Doole [2].
---------
Everyone knows - using OFFSET (especially big) is not an good practice.
But in reality they widely used mostly for paging (because it is simple).
Typical situation is some table (for example tickets) with indexes used for
paging\sorting:
VACUUM FULL;
VACUUM ANALYZE ticket;
SET work_mem = '512MB';
SET random_page_cost = 1.0;
CREATE TABLE ticket AS
SELECT
id,
TRUNC(RANDOM() * 100 + 1) AS project_id,
NOW() + (RANDOM() * (NOW()+'365 days' - NOW())) AS created_date,
repeat((TRUNC(RANDOM() * 100 + 1)::text), 1000) as payload
FROM GENERATE_SERIES(1, 1000000) AS g(id);
CREATE INDEX simple_index ON ticket using btree(project_id, created_date);
And some typical query to do offset on tickets of some project with paging,
some filtration (based on index) and sorting:
SELECT * FROM ticket
WHERE project_id = ?
AND created_date > '20.06.2017'
ORDER BY created_date offset 500 limit 100;
At the current moment IndexScan node will be required to do 600 heap
fetches to execute the query.
But first 500 of them are just dropped by the NodeLimit.
The idea of the patch is to push down offset information in
ExecSetTupleBound (like it done for Top-sort) to IndexScan in case
of simple scan (without projection, reordering and qual). In such situation
we could use some kind of index only scan
(even better because we dont need index data) to avoid fetching tuples
while they are just thrown away by nodeLimit.
Patch is also availble on Github:
https://github.com/michail-nikolaev/postgres/commit/a368c3483250e4c02046d418a27091678cb963f4?diff=split
And some test here:
https://gist.github.com/michail-nikolaev/b7cbe1d6f463788407ebcaec8917d1e0
So, at the moment everything seems to work (check-world is ok too) and I
got next result for test ticket table:
| offset | master | patch
| 100 | ~1.3ms | ~0.7ms
| 1000 | ~5.6ms | ~1.1ms
| 10000 | ~46.7ms | ~3.6ms
To continue development I have following questions:
0) Maybe I missed something huge...
1) Is it required to support non-mvvc (dirty) snapshots? They are not
supported for IndexOnlyScan - not sure about IndexScan.
2) Should I try to pass informaiton about such optimisation to
planner/optimizer? It is not too easy with current desigh but seems
possible.
3) If so, should I add something to EXPLAIN?
4) If so, should I add some counters to EXPLAIN ANALYZE? (for example
number of heap fetch avoided).
5) Should I add description of optimisation to
https://www.postgresql.org/docs/10/static/queries-limit.html ?
6) Maybe you have some ideas for additional tests I need to add.
Thanks a lot.
[1]
https://www.postgresql.org/message-id/CAK-MWwQpZobHfuTtHj9%2B9G%2B5%3Dck%2BaX-ANWHtBK_0_D_qHYxWuw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w%40mail.gmail.com
Attachment | Content-Type | Size |
---|---|---|
offset_index_only.patch | application/x-patch | 15.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2018-01-31 22:51:51 | Re: proposal: alternative psql commands quit and exit |
Previous Message | Robert Haas | 2018-01-31 22:13:41 | Re: pgsql: doc: clearify trigger behavior for inheritance |