Quick Links

Re: machine-readable explain output

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output
Date:	2009-06-16 13:22:27
Message-ID:	603c8f070906160622i384839d8t1ebe8c7011f86c35@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> On 06/16/2009 02:14 PM, Greg Stark wrote:
>>
>> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres(at)anarazel(dot)de>
>> wrote:
>>>
>>> <Startup-Cost>1710.98</Startup-Cost>
>>> <Total-Cost>1710.98</Total-Cost>
>>> <Plan-Rows>72398</Plan-Rows>
>>> <Plan-Width>4</Plan-Width>
>>> <Actual-Startup-Time>136.595</Actual-Startup-Time>
>>> <Actual-Total-Time>136.595</Actual-Total-Time>
>>> <Actual-Rows>72398</Actual-Rows>
>>> <Actual-Loops>1</Actual-Loops>
>>
>> XML's not really my thing currently but it sure seems strange to me to
>> have *everything* be a separate tag like this. Doesn't XML do
>> attributes too? I would have thought to use child tags like this only
>> for things that have some further structure.
>
>> I would have expected something like:
>>
>> <join
>> <scan type=sequential source="foo.bar">
>> <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></>
>> <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></>
>> </scan>
>> <scan type=function source="foo.bar($1)">
>> <parameters>
>> <parameter name="$1" expression="...."></>
>> </parameters>
>> </scan>
>> </join>
>>
>>
>> This would allow something like a graphical explain plan to still make
>> sense of a plan even if it finds a node it doesn't recognize. It would
>> still know generally what to do with a "scan" node or a "join" node
>> even if it is a new type of scan or join.

As long as you understand how the current code uses <Plan> and
<Plans>, you can do this just as well with the current implementation.
Each plan node gets a <Plan>. If there are any plans "under" it, it
gets a <Plans> child which contains those. Whether you put the
additional details into attributes or other tags is irrelevant. As to
why I chose to do it this way, I had a couple of reasons:

1. It didn't seem very wise to go with the approach of trying to do
EVERYTHING with attributes. If I did that, then I'd either get really
long lines that were not easily readable, or I'd have to write some
kind of complicated line wrapping code (which didn't seem to make a
lot of sense for a machine-readable format). The current format isn't
the most beautiful thing I've ever seen, but you don't need a parser
to make sense of it, just a bit of patience.

2. I wanted the JSON output and the XML output to be similar, and that
seemed much easier with this design.

3. We have existing precedent for this design pattern in, e.g. table_to_xml

http://www.postgresql.org/docs/current/interactive/functions-xml.html

> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> <iostat>
> <stat time="10" name=pagefault>...</stat>
> <stat time="20" name=pagefault>...</stat>
> <stat time="30" name=pagefault>...</stat>
> </iostat>
>
> Something like that would be harder with your variant.
>
> Structuring it in tags like suggested above:
> <Plan-Estimates>
> <Startup-Cost>...</Startup-Cost>
> ...
> </Plan-Estimates>
> <Execution-Cost>
> <Startup-Cost>...</Startup-Cost>
> ...
> </Execution-Cost>
>
> Enables displaying unknown 'scalar' values just like your variant and also
> allows more structured values.
>
> It would be interesting to get somebody having used the old explain in an
> automated fashion into this discussion...

Well, one problem with this is that the actual values are not costs,
but times, and the estimated values are not times, but costs. The
planner estimates the cost of operations on an arbitrary scale where
the cost of a sequential page fetch is 1.0. When we measure actual
times, they are in milliseconds. There is no point that I can see in
making it appear that those are the same thing. Observe the current
output:

explain analyze select 1;
QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.007
rows=1 loops=1)
Total runtime: 0.243 ms
(2 rows)

...Robert

In response to

Re: machine-readable explain output at 2009-06-16 12:53:28 from Andres Freund

Responses

Re: machine-readable explain output at 2009-06-16 13:30:58 from Andres Freund
Re: machine-readable explain output at 2009-06-16 13:45:53 from Andrew Dunstan
Re: machine-readable explain output at 2009-06-17 14:27:25 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeremy Kerr	2009-06-16 13:23:41	Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Previous Message	Andrew Dunstan	2009-06-16 13:09:46	Re: [PATCH] backend: compare word-at-a-time in bcTruelen