RE: Improve EXPLAIN output for multicolumn B-Tree Index

From: <Masahiro(dot)Ikeda(at)nttdata(dot)com>
To: <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: <pgsql-hackers(at)lists(dot)postgresql(dot)org>, <Masao(dot)Fujii(at)nttdata(dot)com>
Subject: RE: Improve EXPLAIN output for multicolumn B-Tree Index
Date: 2024-06-24 02:38:32
Message-ID: TYWPR01MB109825DAFBCA3233CA7EE77D7B1D42@TYWPR01MB10982.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowing the difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be a better indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any. 
> Also, do we want to report bound quals even if they are the same as index conditions or just when they are different?

Thank you for your comment. After receiving your comment, I thought it would be better to also report information that would make the difference in selectivity understandable. One idea I had is to output the number of index tuples inefficiently extracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the number.

Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it would be better to modify it so that it does not output when the bound quals are the same as the index conditions.

In my local PoC patch, I have modified the output as follows, what do you think?

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output won’t change.
Planning Time: 5.088 ms
Execution Time: 0.162 ms
(5 rows)

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals.
Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals
Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified
Planning Time: 0.354 ms
Execution Time: 279.908 ms
(7 rows)

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-06-24 02:51:26 Re: Pgoutput not capturing the generated columns
Previous Message Zhijie Hou (Fujitsu) 2024-06-24 02:09:27 RE: Conflict detection and logging in logical replication