Re: Detoasting optionally to make Explain-Analyze less misleading

From: stepan rutz <stepan(dot)rutz(at)gmx(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Date: 2023-11-02 19:09:30
Message-ID: 9126b7da-204b-4701-8e2b-37fb824a0cf4@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

you are right of course. Thanks!

I have attached a new version of the patch that supports the syntax like
suggested. The previous patch was insonsistent in style indeed.

explain (analyze, serialize)

and

explain (analyze, serialize binary)

That doesn't make too much of a difference for most scenarios I am
certain. However the the seralize option itself does. Mostly because it
performs the detoasting and that was a trap for me in the past with just
plain analyze.

Eg this scenario really is not too far fetched in a world where people
have large JSONB values.

db1=# create table test(id bigint, val text);

db1=# insert into test(val) select string_agg(s::text, ',') from (select
generate_series(1, 10_000_000) as s) as a1;

now we have a cell that has roughly 80Mb in it. A large detoasting that
will happen in reallife but in explain(analyze).

and then...

db1=# explain (analyze) select * from test;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.018..0.020 rows=1 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 0.044 ms
(3 rows)

db1=# explain (analyze, serialize) select * from test;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.023..0.027 rows=1 loops=1)
 Planning Time: 0.077 ms
 Execution Time: 303.281 ms
 Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
(4 rows)

db1=#

So the explain(analyze) does not process the ~80 MB in 0.044ms in any
way of course.

Actually I could print the serialized bytes using 1. grouping-separators
(eg 78_888_953) or 2. in the way pg_size_pretty does it.

If doing it the pg_size_pretty way I am uncertain if it would be ok to
query the actual pg_size_pretty function via its (certainly frozen) oid
of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say
it would be nice if the code from that function would be made available
as a utility function for all c-code.  Any suggestions on this topic?

Regards,

/Stepan

On 02.11.23 18:49, Tomas Vondra wrote:
> Hi,
>
> On 9/15/23 22:09, stepan rutz wrote:
>> Hi,
>>
>> please see a revised version yesterday's mail. The patch attached now
>> provides the following:
>>
>> EXPLAIN(ANALYZE,SERIALIZE)
>>
>> and
>>
>> EXPLAIN(ANALYZE,SERIALIZEBINARY)
>>
> I haven't looked at the patch in detail yet, but this option name looks
> a bit strange/inconsistent. Either it should be SERIALIZE_BINARY (to
> match other multi-word options), or maybe there should be just SERIALIZE
> with a parameter to determine text/binary (like FORMAT, for example).
>
> So we'd do either
>
> EXPLAIN (SERIALIZE)
> EXPLAIN (SERIALIZE TEXT)
>
> to get serialization to text (which I guess 99% of people will do), or
>
> EXPLAIN (SERIALIZE BINARY)
>
> to get binary.
>
>
> regards
>

Attachment Content-Type Size
0006_explain_analyze_and_serialize.patch text/x-patch 17.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-11-02 19:25:00 Re: Add new option 'all' to pg_stat_reset_shared()
Previous Message John Morris 2023-11-02 19:00:56 Re: Including Doxyfile and Meson script for docs into main source tree