Quick Links

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Masahiro(dot)Ikeda(at)nttdata(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject:	Re: Improve EXPLAIN output for multicolumn B-Tree Index
Date:	2024-06-21 12:29:12
Message-ID:	CAExHW5sdpKOO+cj9=UBx_vSv=0mt2ncjJyrpdwgmbxFmWY0THg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jun 21, 2024 at 12:42 PM <Masahiro(dot)Ikeda(at)nttdata(dot)com> wrote:

> Hi,
>
>
>
> Regarding the multicolumn B-Tree Index, I'm considering
>
> if we can enhance the EXPLAIN output. There have been requests
>
> for this from our customer.
>
>
>
> As the document says, we need to use it carefully.
>
> > The exact rule is that equality constraints on leading columns,
>
> > plus any inequality constraints on the first column that does
>
> > not have an equality constraint, will be used to limit the portion
>
> > of the index that is scanned.
>
> *https://www.postgresql.org/docs/17/indexes-multicolumn.html
> <https://www.postgresql.org/docs/17/indexes-multicolumn.html>*
>
>
>
> However, it's not easy to confirm whether multi-column indexes are
>
> being used efficiently because we need to compare the index
>
> definitions and query conditions individually.
>
>
>
> For instance, just by looking at the following EXPLAIN result, we
>
> can't determine whether the index is being used efficiently or not
>
> at a glance. Indeed, the current index definition is not suitable
>
> for the query, so the cost is significantly high.
>
>
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 =
> 101;
>
> QUERY
> PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------
>
> Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1
> width=18) (actual time=0.033..54.115 rows=1 loops=1)
>
> Output: id1, id2, id3, value
>
> Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) -- Is it
> efficient or not?
>
> Planning Time: 0.145 ms
>
> Execution Time: 54.150 ms
>
> (6 rows)
>
>
>
> So, I'd like to improve the output to be more user-friendly.
>
>
>
>
>
> # Idea
>
>
>
> I'm considering adding new information, "Index Bound Cond", which specifies
>
> what quals will be used for the boundary condition of the B-Tree index.
>
> (Since this is just my current idea, I'm open to changing the output.)
>
>
>
> Here is an example output.
>
>
>
> -- prepare for the test
>
> CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));
>
> CREATE INDEX test_idx ON test(id1, id2, id3); --
> multicolumn B-Tree index
>
> INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM
> generate_series(1,1000000) s(i));
>
> ANALYZE;
>
>
>
> -- explain
>
>
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 =
> 101;
>
> QUERY
> PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------
>
> Index Scan using test_idx on public.test (cost=0.42..8.45 rows=1
> width=18) (actual time=0.046..0.047 rows=1 loops=1)
>
> Output: id1, id2, id3, value
>
> Index Cond: ((test.id1 = 1) AND (test.id2 = 101))
>
> Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- The B-Tree
> index is used efficiently.
>
> Planning Time: 0.124 ms
>
> Execution Time: 0.076 ms
>
> (6 rows)
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 =
> 101;
>
> QUERY
> PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------
>
> Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1
> width=18) (actual time=0.033..54.115 rows=1 loops=1)
>
> Output: id1, id2, id3, value
>
> Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
>
> Index Bound Cond: (test.id1 = 1) -- The B-tree
> index is *not* used efficiently
>
> -- compared to
> the previous execution conditions,
>
> -- because it
> differs from "Index Cond".
>
> Planning Time: 0.145 ms
>
> Execution Time: 54.150 ms
>
> (6 rows)
>
>
>
>
>
> # PoC patch
>
>
>
> The PoC patch makes the following changes:
>
>
>
> * Adds a new variable related to bound conditions
>
> to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan
>
> * Adds quals for bound conditions to IndexPath when estimating cost, since
>
> the B-Tree index considers the boundary condition in btcostestimate()
>
> * Adds quals for bound conditions to the output of EXPLAIN
>
>
>
>
>
>
>
> Thank you for reading my suggestion. Please feel free to comment.
>
>
>
> * Is this feature useful? Is there a possibility it will be accepted?
>
> * Are there any other ideas for determining if multicolumn indexes are
>
> being used efficiently? Although I considered calculating the efficiency
> using
>
> pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,
>
> I believe improving the EXPLAIN output is better because it can be output
>
> per query and it's more user-friendly.
>
> * Is "Index Bound Cond" the proper term?I also considered changing
>
> "Index Cond" to only show quals for the boundary condition and adding
>
> a new term "Index Filter".
>
> * Would it be better to add new interfaces to Index AM? Is there any case
>
> to output the EXPLAIN for each index context? At least, I think it's
> worth
>
> considering whether it's good for amcostestimate() to modify the
>
> IndexPath directly as the PoC patch does.
>
>
>

I am unable to decide whether reporting the bound quals is just enough to
decide the efficiency of index without knowing the difference in the number
of index tuples selectivity and heap tuple selectivity. The difference
seems to be a better indicator of index efficiency whereas the bound quals
will help debug the in-efficiency, if any.

Also, do we want to report bound quals even if they are the same as index
conditions or just when they are different?
--
Best Wishes,
Ashutosh Bapat

In response to

Improve EXPLAIN output for multicolumn B-Tree Index at 2024-06-21 07:12:25 from Masahiro.Ikeda

Responses

RE: Improve EXPLAIN output for multicolumn B-Tree Index at 2024-06-24 02:38:32 from Masahiro.Ikeda

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bertrand Drouvot	2024-06-21 13:22:43	Re: Avoid orphaned objects dependencies, take 3
Previous Message	Amit Langote	2024-06-21 12:18:15	Re: SQL/JSON query functions context_item doc entry and type requirement