From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, stepan rutz <stepan(dot)rutz(at)gmx(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Detoasting optionally to make Explain-Analyze less misleading |
Date: | 2023-11-02 21:33:57 |
Message-ID: | CAEze2Wg8UMUuT9sHpWGCeBzm-nHTbQcnPsSob4ynAPXjiD-G=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2 Nov 2023 at 22:25, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 11/2/23 21:02, Matthias van de Meent wrote:
> > On Thu, 2 Nov 2023 at 20:32, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >> On 11/2/23 20:09, stepan rutz wrote:
> >>> db1=# explain (analyze, serialize) select * from test;
> >>> QUERY PLAN
> >>> ---------------------------------------------------------------------------------------------------
> >>> Seq Scan on test (cost=0.00..22.00 rows=1200 width=40) (actual
> >>> time=0.023..0.027 rows=1 loops=1)
> >>> Planning Time: 0.077 ms
> >>> Execution Time: 303.281 ms
> >>> Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> >> [...]
> >> BTW if you really want to print amount of memory, maybe print it in
> >> kilobytes, like every other place in explain.c?
> >
> > Isn't node width in bytes, or is it an opaque value not to be
> > interpreted by users? I've never really investigated that part of
> > Postgres' explain output...
> >
>
> Right, "width=" is always in bytes. But fields like amount of sorted
> data is in kB, and this seems closer to that.
>
> >> Also, explain generally
> >> prints stuff in "key: value" style (in text format).
> >
> > That'd be key: metrickey=metricvalue for expanded values like those in
> > plan nodes and the buffer usage, no?
> >
>
> Possibly. But the proposed output does neither. Also, it starts with
> "Serialized Bytes" but then prints info about bandwidth.
>
>
> >>> Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> >
> > I was thinking more along the lines of something like this:
> >
> > [...]
> > Execution Time: xxx ms
> > Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
> > (or binary)
> > > This is significantly different from your output, as it doesn't hide
> > the measured time behind a lossy calculation of bandwidth, but gives
> > the measured data to the user; allowing them to derive their own
> > precise bandwidth if they're so inclined.
> >
>
> Might work. I'm still not convinced we need to include the mode, or that
> the size is that interesting/useful, though.
I'd say size is interesting for systems where network bandwidth is
constrained, but CPU isn't. We currently only show estimated widths &
accurate number of tuples returned, but that's not an accurate
explanation of why your 30-row 3GB resultset took 1h to transmit on a
10mbit line - that is only explained by the bandwidth of your
connection and the size of the dataset. As we can measure the size of
the returned serialized dataset here, I think it's in the interest of
any debugability to also present it to the user. Sadly, we don't have
good measures of bandwidth without sending that data across, so that's
the only metric that we can't show here, but total query data size is
definitely something that I'd be interested in here.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-11-02 21:51:12 | Re: Pre-proposal: unicode normalized text |
Previous Message | Tomas Vondra | 2023-11-02 21:25:16 | Re: Detoasting optionally to make Explain-Analyze less misleading |