From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) |
Date: | 2024-12-03 15:01:39 |
Message-ID: | CAEze2WgLXNCBbq1ndSYxGcNid1Rrmqrf8Jy+hMgOsb9TPUnwLA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 28 Nov 2024 at 22:09, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> Hi!
>
> On 27.11.2024 16:36, Matthias van de Meent wrote:
>> On Wed, 27 Nov 2024 at 14:22, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> Sorry it took me so long to answer, I had some minor health complications
>>>
>>> On 12.11.2024 23:00, Peter Geoghegan wrote:
>>>
>>> On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina
>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>
>>> Or maybe I was affected by fatigue, but I don’t understand this point, to be honest. I see from the documentation and your first letter that it specifies how many times in total the tuple search would be performed during the index execution. Is that not quite right?
>>>
>>> Well, nodes that appear on the inner side of a nested loop join (and
>>> in a few other contexts) generally have their row counts (and a few
>>> other things) divided by the total number of executions. The idea is
>>> that we're showing the average across all executions of the node -- if
>>> the user wants the true absolute number, they're expected to multiply
>>> nrows by nloops themselves. This is slightly controversial behavior,
>>> but it is long established (weirdly, we never divide by nloops for
>>> "Buffers").
>>>
>>> I understood what you mean and I faced this situation before when I saw extremely more number of actual rows that could be and it was caused by the number of scanned tuples per cycles. [0]
>>>
>>> [0] https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df812f(at)postgrespro(dot)ru
>>>
>>> Initial versions of my patch didn't do this. The latest version does
>>> divide like this, though. In general it isn't all that likely that an
>>> inner index scan would have more than a single primitive index scan,
>>> in any case, so which particular behavior I use here (divide vs don't
>>> divide) is not something that I feel strongly about.
>>>
>>> I think we should divide them because by dividing the total buffer usage by the number of loops, user finds the average buffer consumption per loop. This gives them a clearer picture of the resource intensity per basic unit of work.
>> I disagree; I think the whole "dividing by number of loops and
>> rounding up to integer" was the wrong choice for tuple count, as that
>> makes it difficult if not impossible to determine the actual produced
>> count when it's less than the number of loops. Data is lost in the
>> rounding/processing, and I don't want to have lost that data.
>>
>> Same applies for ~scans~ searches: If we do an index search, we should
>> show it in the count as total sum, not partial processed value. If a
>> user is interested in per-loopcount values, then they can derive that
>> value from the data they're presented with; but that isn't true when
>> we present only the divided-and-rounded value.
>>
> To be honest, I didn't understand how it will be helpful because there
> is an uneven distribution of buffer usage from cycle to cycle, isn't it?
I'm sorry, I don't quite understand what you mean by cycle here.
> I thought that the dividing memory on number of cycles helps us to
> normalize the metric to account for the repeated iterations. This gives
> us a clearer picture of the resource intensity per basic unit of work,
> rather than just the overall total. Each loop may consume a different
> amount of buffer space, but by averaging it out, we're smoothing those
> fluctuations into a more representative measure.
The issue I see here is that users can get those numbers from raw
results, but they can't get the raw (more accurate) data from the
current output; if we only show processed data (like the 'rows' metric
in text output, which is a divided-and-rounded value) you can't get
the original data back with good confidence.
E.g., I have a table 'twentyone' with values 1..21, and I left join it
on a table 'ten' with values 1..10. The current text explain output
-once the planner is convinced to execute (nested loop left join
(seqscan 'thousand'), (index scan 'ten'))- will show that the index
scan path produced 0 rows, which is clearly wrong, and I can't get the
original value back with accuracy by multiplying rows with loops due
to the rounding.
> Moreover, this does not correspond to another metric that is nearby -
> the number of lines processed by the algorithm for the inner node.
It doesn't have much correspondence to that anyway, as we don't count
lines that were accessed but didn't match index quals, nor heap tuples
filtered by rechecks, in the `rows` metric.
> Will
> not the user who evaluates the query plan be confused by such a discrepancy?
I think users will be more confused about a discrepancy between buffer
accesses and index searches (which are more closely related to
eachother) than a discrepancy between index searches and
rounded-average-number-of-tuples-produced-per-loop, or the discrepancy
between not-quite-average-tuples-procuded-per-loop vs the "heap
fetches" counter of an IndexOnlyScan, etc.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2024-12-03 15:01:41 | Re: Drop back the redundant "Lock" suffix from LWLock wait event names |
Previous Message | Robert Haas | 2024-12-03 14:32:06 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |