Explain: Duplicate key "Workers" in JSON format

From: Pierre Giraud <pierre(dot)giraud(at)dalibo(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Explain: Duplicate key "Workers" in JSON format
Date: 2019-08-23 12:47:56
Message-ID: 41ee53a5-a36e-cc8f-1bee-63f6565bb1ee@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm currently working on a tool to visualize an execution plan [1]. For
those who know PEV, it's actually a fork of this great tool since it
hasn't been active for more than 2 years.

Among other things, I'd like to show information for the parallel
queries. First of which is information about the workers.

I'm facing a problem when I am trying to parse a plan in the JSON
format. The "Workers" key may be duplicated.

While it's not invalid to have keys with the same name several times at
the same level in a JSON object, it makes it almost impossible to get
the full info when parsed. Indeed when parsing such a JSON string only
the last key is kept. Part of the information is lost.

JSON validators warn us with the following message : "Duplicate key,
names should be unique."

Here's an example of a plan in VERBOSE mode.

[
{
"Plan": {
"Node Type": "Gather Merge",
"Parallel Aware": false,
"Actual Startup Time": 1720.052,
"Actual Total Time": 4252.290,
"Actual Rows": 10000000,
"Actual Loops": 1,
"Output": ["c1", "c2"],
"Workers Planned": 2,
"Workers Launched": 2,
"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Startup Time": 1558.638,
"Actual Total Time": 2127.522,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Sort Key": ["t1.c1"],
"Sort Method": "external merge",
"Sort Space Used": 126152,
"Sort Space Type": "Disk",
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 73552,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 73320,
"Sort Space Type": "Disk"
}
],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 1487.846,
"Actual Total Time": 1996.879,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 1468.256,
"Actual Total Time": 2012.744,
"Actual Rows": 2684443,
"Actual Loops": 1
}
],
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": true,
"Relation Name": "t1",
"Schema": "public",
"Alias": "t1",
"Actual Startup Time": 0.211,
"Actual Total Time": 372.858,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 0.029,
"Actual Total Time": 368.356,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 0.033,
"Actual Total Time": 368.874,
"Actual Rows": 2684443,
"Actual Loops": 1
}
]
}
]
}
]
},
"Planning Time": 0.170,
"Triggers": [
],
"Execution Time": 4695.141
}
]

As you can see, the "Workers" key is duplicated in the Sort node.

Here's the equivalent in TEXT format:

---------------------------------
Gather Merge (cost=735306.27..1707599.95 rows=8333364 width=17)
(actual time=1560.468..3749.583 rows=10000000 loops=1)
Output: c1, c2
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=734306.25..744722.95 rows=4166682 width=17) (actual
time=1474.182..1967.788 rows=3333333 loops=3)
Output: c1, c2
Sort Key: t1.c1
Sort Method: external merge Disk: 125168kB
Worker 0: Sort Method: external merge Disk: 73768kB
Worker 1: Sort Method: external merge Disk: 74088kB
Worker 0: actual time=1431.136..1883.370 rows=2700666 loops=1
Worker 1: actual time=1431.175..1891.630 rows=2712505 loops=1
-> Parallel Seq Scan on public.t1 (cost=0.00..105264.82
rows=4166682 width=17) (actual time=0.214..386.014 rows=3333333 loops=3)
Output: c1, c2
Worker 0: actual time=0.027..382.325 rows=2700666 loops=1
Worker 1: actual time=0.038..384.951 rows=2712505 loops=1
Planning Time: 0.180 ms
Execution Time: 4166.867 ms
(18 rows)
---------------------------------

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Thanks.

[1] https://dalibo.github.io/pev2/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-08-23 13:03:10 Re: WIP/PoC for parallel backup
Previous Message Andrew Dunstan 2019-08-23 12:25:01 Re: mingw32 floating point diff