Re: Improve EXPLAIN output for multicolumn B-Tree Index

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Masahiro(dot)Ikeda(at)nttdata(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Improve EXPLAIN output for multicolumn B-Tree Index
Date: 2024-06-24 12:55:35
Message-ID: CAExHW5sZXUQDoV7VkdMGyFHtWLvkZZgfg05jxyW09jXUjtawEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2024 at 8:08 AM <Masahiro(dot)Ikeda(at)nttdata(dot)com> wrote:

> > I am unable to decide whether reporting the bound quals is just enough
> to decide the efficiency of index without knowing the difference in the
> number of index tuples selectivity and heap tuple selectivity. The
> difference seems to be a better indicator of index efficiency whereas the
> bound quals will help debug the in-efficiency, if any.
> > Also, do we want to report bound quals even if they are the same as
> index conditions or just when they are different?
>
> Thank you for your comment. After receiving your comment, I thought it
> would be better to also report information that would make the difference
> in selectivity understandable. One idea I had is to output the number of
> index tuples inefficiently extracted, like “Rows Removed by Filter”. Users
> can check the selectivity and efficiency by looking at the number.
>
> Also, I thought it would be better to change the way bound quals are
> reported to align with the "Filter". I think it would be better to modify
> it so that it does not output when the bound quals are the same as the
> index conditions.
>
> In my local PoC patch, I have modified the output as follows, what do you
> think?
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 =
> 101;
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------
> Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1
> width=18) (actual time=0.082..0.086 rows=1 loops=1)
> Output: id1, id2, id3, value
> Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s
> efficient, the output won’t change.
> Planning Time: 5.088 ms
> Execution Time: 0.162 ms
> (5 rows)
>

This looks fine. We may highlight in the documentation that lack of Index
bound quals in EXPLAIN output indicate that they are same as Index Cond:.
Other idea is to use Index Cond and bound quals as property name but that's
too long.

>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 =
> 101;
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------
> Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1
> width=18) (actual time=0.175..279.819 rows=1 loops=1)
> Output: id1, id2, id3, value
> Index Cond: (test.id1 = 1) -- Change the output. Show
> only the bound quals.
> Index Filter: (test.id3 = 101) -- New. Output quals which
> are not used as the bound quals
> Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE
> option is specified
> Planning Time: 0.354 ms
> Execution Time: 279.908 ms
> (7 rows)
>

I don't think we want to split these clauses. Index Cond should indicate
the conditions applied to the index scan. Bound quals should be listed
separately even though they will have an intersection with Index Cond. I am
not sure whether Index Filter is the right name, maybe Index Bound Cond:
But I don't know this area enough to make a final call.

About Rows Removed by Index Filter: it's good to provide a number when
ANALYZE is specified, but it will be also better to specify what was
estimated. We do that for (cost snd rows etc.) but doing that somewhere in
the plan output may not have a precedent. I think we should try that and
see what others think.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-06-24 12:56:03 Re: Pgoutput not capturing the generated columns
Previous Message Jelte Fennema-Nio 2024-06-24 12:41:53 Re: Improve EXPLAIN output for multicolumn B-Tree Index