Quick Links

[WIP PATCH] Index scan offset optimisation using visibility map

From:	Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	[WIP PATCH] Index scan offset optimisation using visibility map
Date:	2018-01-31 22:17:35
Message-ID:	CANtu0oi3a1Rf1PVsBufQbm+g9ytSv75+kp7kJYvK5C1qF_0Siw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello.

WIP-Patch for optimisation of OFFSET + IndexScan using visibility map.
Patch based on idea of Maxim Boguk [1] with some inspiration from Douglas
Doole [2].
---------
Everyone knows - using OFFSET (especially big) is not an good practice.
But in reality they widely used mostly for paging (because it is simple).

Typical situation is some table (for example tickets) with indexes used for
paging\sorting:

VACUUM FULL;
VACUUM ANALYZE ticket;
SET work_mem = '512MB';
SET random_page_cost = 1.0;

CREATE TABLE ticket AS
SELECT
id,
TRUNC(RANDOM() * 100 + 1) AS project_id,
NOW() + (RANDOM() * (NOW()+'365 days' - NOW())) AS created_date,
repeat((TRUNC(RANDOM() * 100 + 1)::text), 1000) as payload
FROM GENERATE_SERIES(1, 1000000) AS g(id);

CREATE INDEX simple_index ON ticket using btree(project_id, created_date);

And some typical query to do offset on tickets of some project with paging,
some filtration (based on index) and sorting:

SELECT * FROM ticket
WHERE project_id = ?
AND created_date > '20.06.2017'
ORDER BY created_date offset 500 limit 100;

At the current moment IndexScan node will be required to do 600 heap
fetches to execute the query.
But first 500 of them are just dropped by the NodeLimit.

The idea of the patch is to push down offset information in
ExecSetTupleBound (like it done for Top-sort) to IndexScan in case
of simple scan (without projection, reordering and qual). In such situation
we could use some kind of index only scan
(even better because we dont need index data) to avoid fetching tuples
while they are just thrown away by nodeLimit.

Patch is also availble on Github:
https://github.com/michail-nikolaev/postgres/commit/a368c3483250e4c02046d418a27091678cb963f4?diff=split
And some test here:
https://gist.github.com/michail-nikolaev/b7cbe1d6f463788407ebcaec8917d1e0

So, at the moment everything seems to work (check-world is ok too) and I
got next result for test ticket table:
| offset | master | patch
| 100 | ~1.3ms | ~0.7ms
| 1000 | ~5.6ms | ~1.1ms
| 10000 | ~46.7ms | ~3.6ms

To continue development I have following questions:
0) Maybe I missed something huge...
1) Is it required to support non-mvvc (dirty) snapshots? They are not
supported for IndexOnlyScan - not sure about IndexScan.
2) Should I try to pass informaiton about such optimisation to
planner/optimizer? It is not too easy with current desigh but seems
possible.
3) If so, should I add something to EXPLAIN?
4) If so, should I add some counters to EXPLAIN ANALYZE? (for example
number of heap fetch avoided).
5) Should I add description of optimisation to
https://www.postgresql.org/docs/10/static/queries-limit.html ?
6) Maybe you have some ideas for additional tests I need to add.

Thanks a lot.

[1]
https://www.postgresql.org/message-id/CAK-MWwQpZobHfuTtHj9%2B9G%2B5%3Dck%2BaX-ANWHtBK_0_D_qHYxWuw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w%40mail.gmail.com

Attachment	Content-Type	Size
offset_index_only.patch	application/x-patch	15.4 KB

Responses

Re: [WIP PATCH] Index scan offset optimisation using visibility map at 2018-02-05 20:36:52 from Andrey Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2018-01-31 22:51:51	Re: proposal: alternative psql commands quit and exit
Previous Message	Robert Haas	2018-01-31 22:13:41	Re: pgsql: doc: clearify trigger behavior for inheritance