Re: Detoasting optionally to make Explain-Analyze less misleading

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: stepan rutz <stepan(dot)rutz(at)gmx(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Date: 2023-11-02 22:24:36
Message-ID: 14746b40-16a8-b53e-18a6-f2872e696e34@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/2/23 22:33, Matthias van de Meent wrote:
> On Thu, 2 Nov 2023 at 22:25, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>>
>>
>> On 11/2/23 21:02, Matthias van de Meent wrote:
>>> On Thu, 2 Nov 2023 at 20:32, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> On 11/2/23 20:09, stepan rutz wrote:
>>>>> db1=# explain (analyze, serialize) select * from test;
>>>>> QUERY PLAN
>>>>> ---------------------------------------------------------------------------------------------------
>>>>> Seq Scan on test (cost=0.00..22.00 rows=1200 width=40) (actual
>>>>> time=0.023..0.027 rows=1 loops=1)
>>>>> Planning Time: 0.077 ms
>>>>> Execution Time: 303.281 ms
>>>>> Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
>>>> [...]
>>>> BTW if you really want to print amount of memory, maybe print it in
>>>> kilobytes, like every other place in explain.c?
>>>
>>> Isn't node width in bytes, or is it an opaque value not to be
>>> interpreted by users? I've never really investigated that part of
>>> Postgres' explain output...
>>>
>>
>> Right, "width=" is always in bytes. But fields like amount of sorted
>> data is in kB, and this seems closer to that.
>>
>>>> Also, explain generally
>>>> prints stuff in "key: value" style (in text format).
>>>
>>> That'd be key: metrickey=metricvalue for expanded values like those in
>>> plan nodes and the buffer usage, no?
>>>
>>
>> Possibly. But the proposed output does neither. Also, it starts with
>> "Serialized Bytes" but then prints info about bandwidth.
>>
>>
>>>>> Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
>>>
>>> I was thinking more along the lines of something like this:
>>>
>>> [...]
>>> Execution Time: xxx ms
>>> Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
>>> (or binary)
>>>> This is significantly different from your output, as it doesn't hide
>>> the measured time behind a lossy calculation of bandwidth, but gives
>>> the measured data to the user; allowing them to derive their own
>>> precise bandwidth if they're so inclined.
>>>
>>
>> Might work. I'm still not convinced we need to include the mode, or that
>> the size is that interesting/useful, though.
>
> I'd say size is interesting for systems where network bandwidth is
> constrained, but CPU isn't. We currently only show estimated widths &
> accurate number of tuples returned, but that's not an accurate
> explanation of why your 30-row 3GB resultset took 1h to transmit on a
> 10mbit line - that is only explained by the bandwidth of your
> connection and the size of the dataset. As we can measure the size of
> the returned serialized dataset here, I think it's in the interest of
> any debugability to also present it to the user. Sadly, we don't have
> good measures of bandwidth without sending that data across, so that's
> the only metric that we can't show here, but total query data size is
> definitely something that I'd be interested in here.

Yeah, I agree with that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-02 22:34:18 Re: Remove distprep
Previous Message Thomas Munro 2023-11-02 21:51:12 Re: Pre-proposal: unicode normalized text