Re: explain analyze rows=%.0f

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Guillaume Lelarge <guillaume(at)lelarge(dot)info>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, vignesh C <vignesh21(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: explain analyze rows=%.0f
Date: 2025-02-10 21:14:47
Message-ID: 6acda85e-7af0-49fb-a4b1-13cd9f584f99@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10.02.2025 23:43, Matheus Alcantara wrote:
>>>> When the total number of returned tuples is less than the number of
>>>> loops currently shows 'rows = 0'. This can mislead users into thinking
>>>> that no rows were returned at all, even though some might have appeared
>>>> occasionally.
>>>>
>>> I think that this can happen when the returned rows and the loops are small
>>> enough to result in a 'row' value like 0.00045? I'm not sure if we have
>>> "bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17
>>> rows) this would also be true, what do you think? If you could provide
>>> an example of this case would be great!
>>
>> Based on what was discussed earlier in the thread, there are cases with
>> large loops [0]. However, I believe it's better not to display average
>> rows with excessively long digits or in scientific notation. And, of
>> course, I agree with you regarding small values. I think we should also
>> add a check to ensure that the total rows is actually greater than zero.
>> When the total rows is zero, we could simply display it as an integer
>> without decimals. It could help users average rows is very small but not
>> zero. What do you think about this approach?
>>
> Yeah, I agree with you about the long digits. My question is more about why do
> we need the planstate->instrument->ntuples < nloops check? I tried to remove
> this check and I got a lot of EXPLAIN output that shows 'rows' values with .00,
> so I'm just trying to understand the reason. From what I've understood about
> this thread is that just avoiding .00 decimals of 'rows' values that could be
> just integers would be enough, is that right or I'm missing something here? I'm
> just worried if we could have a scenario where nloops > 1 &&
> planstate->instrument->ntuples < nloops which would make the 'rows' not be
> formatted correctly.

Sorry for missing your question earlier. If you notice in the code
above, the variable(average) 'rows' is defined as:

double rows = planstate->instrument->ntuples / nloops;

This represents the total rows divided by the number of loops. The
condition means that variable 'rows' will always  between zero and one.
Therefore, the average rows under such conditions cannot be greater than
or even equal to one. I wrote this condition specifically to avoid the
verbose expression 'rows > 0 && rows < 1'. However, since this might not
be obvious to everyone, perhaps it'd be better to write is using 'rows'
directly or add a comment explaining this logic.

>>> - executing the index scans on <literal>tenk2</literal>.
>>> + executing the index scans on <literal>tenk2</literal>. If a subplan node
>>> + is executed multiple times and the average number of rows is less than one,
>>> + the rows and <literal>loops</literal> values are shown as a
>>> decimal fraction
>>> + (with two digits after the decimal point) to indicate that some rows
>>> + were actually processed rather than simply rounding down to zero.
>>>
>>> * I think that it would be good to mention what a 'row' value in
>>> decimal means. For
>>> example, if its says "0.1 rows" the user should assume that typically 0 rows
>>> will be returned but sometimes it can return 1 or more.
>>>
>>> * There are more spaces than necessary before "If a subplan node ..."
>>>
>>> * Maybe wrap 'rows' with <literal> </literal>?
>>>
>> I agree with the last two points. As for the first one—maybe we could
>> simply state that the average rows value can be decimal, especially for
>> very small values?
>>
> I'm just not sure about the "small values"; the 'rows' in decimal will only
> happen with small values? What would be a "small value" in this context? My main
> point here is more that I think that it would be good to mention *why* the
> 'rows' can be decimal, not just describe that it could be decimal.
>

As for 'small values', it means that the average rows is between zero
and one, to avoid rounding errors and misunderstanding. I think this
would be ideal.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-02-10 21:22:21 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Nathan Bossart 2025-02-10 20:52:46 Re: Track the amount of time waiting due to cost_delay