RE: Adding skip scan (including MDAM style range skip scan) to nbtree

From: <Masahiro(dot)Ikeda(at)nttdata(dot)com>
To: <pg(at)bowt(dot)ie>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: <Masao(dot)Fujii(at)nttdata(dot)com>
Subject: RE: Adding skip scan (including MDAM style range skip scan) to nbtree
Date: 2024-07-12 05:18:56
Message-ID: TYWPR01MB10982A413E0EC4088E78C0E11B1A62@TYWPR01MB10982.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Since I'd like to understand the skip scan to improve the EXPLAIN output
for multicolumn B-Tree Index[1], I began to try the skip scan with some
queries and look into the source code.

I have some feedback and comments.

(1)

At first, I was surprised to look at your benchmark result because the skip scan
index can improve much performance. I agree that there are many users to be
happy with the feature for especially OLAP use-case. I expected to use v18.

(2)

I found the cost is estimated to much higher if the number of skipped attributes
is more than two. Is it expected behavior?

# Test result. The attached file is the detail of tests.

-- Index Scan
-- The actual time is low since the skip scan works well
-- But the cost is higher than one of seqscan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_id1_id2_id3 on public.test (cost=0.42..26562.77 rows=984 width=20) (actual time=0.051..15.533 rows=991 loops=1)
Output: id1, id2, id3, value
Index Cond: (test.id3 = 101)
Buffers: shared hit=4402
Planning:
Buffers: shared hit=7
Planning Time: 0.234 ms
Execution Time: 15.711 ms
(8 rows)

-- Seq Scan
-- actual time is high, but the cost is lower than one of the above Index Scan.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..12676.73 rows=984 width=20) (actual time=0.856..113.861 rows=991 loops=1)
Output: id1, id2, id3, value
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6370
-> Parallel Seq Scan on public.test (cost=0.00..11578.33 rows=410 width=20) (actual time=0.061..102.016 rows=330 loops=3)
Output: id1, id2, id3, value
Filter: (test.id3 = 101)
Rows Removed by Filter: 333003
Buffers: shared hit=6370
Worker 0: actual time=0.099..98.014 rows=315 loops=1
Buffers: shared hit=2066
Worker 1: actual time=0.054..97.162 rows=299 loops=1
Buffers: shared hit=1858
Planning:
Buffers: shared hit=19
Planning Time: 0.194 ms
Execution Time: 114.129 ms
(18 rows)

I look at btcostestimate() to find the reason and found the bound quals
and cost.num_sa_scans are different from my expectation.

My assumption is
* bound quals is id3=XXX (and id1 and id2 are skipped attributes)
* cost.num_sa_scans = 100 (=10*10 because assuming 10 primitive index scans
per skipped attribute)

But it's wrong. The above index scan result is
* bound quals is NULL
* cost.num_sa_scans = 1

As I know you said the below, but I'd like to know the above is expected or not.

> That approach seems far more practicable than preempting the problem
> during planning or during nbtree preprocessing. It seems like it'd be
> very hard to model the costs statistically. We need revisions to
> btcostestimate, of course, but the less we can rely on btcostestimate
> the better. As I said, there are no new index paths generated by the
> optimizer for any of this.

I couldn't understand why there is the below logic well.

btcostestimate()
(...omit...)
if (indexcol != iclause->indexcol)
{
/* no quals at all for indexcol */
found_skip = true;
if (index->pages < 100)
break;
num_sa_scans += 10 * (indexcol - iclause->indexcol); // why add minus value?
continue; // why skip to add bound quals?
}

(3)

Currently, there is an assumption that "there will be 10 primitive index scans
per skipped attribute". Is any chance to use pg_stats.n_distinct?

[1] Improve EXPLAIN output for multicolumn B-Tree Index
https://www.postgresql.org/message-id/flat/TYWPR01MB1098260B694D27758FE2BA46FB1C92%40TYWPR01MB10982.jpnprd01.prod.outlook.com

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
compare_cost.sql application/octet-stream 2.3 KB
compare_cost_with_v2_patch.out application/octet-stream 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-12 05:21:09 Re: Allow non-superuser to cancel superuser tasks.
Previous Message Ashutosh Bapat 2024-07-12 05:12:35 Re: Test to dump and restore objects left behind by regression