Re: [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

From: Remi Colinet <remi(dot)colinet(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities
Date: 2017-08-01 16:35:11
Message-ID: CADdR5nxfJ4jHiZrC46crer1LCMNeiw01gm2qO0Apgqx2ewVYGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did it in version 2 of the patch.
The patch could yield TEXT, JSON, and XML ouput.

For below query, it gives:

=> Terminal 1
test=# select * from t_10m, t_1m where t_10m.md5 like '%cb%';

=> Terminal 2
test=# \watch PROGRESS 9546;
Wed 10 May 2017 06:29:59 PM CEST (every 1s)

PLAN
PROGRESS
---------------------------------------------------------------------------------------------------------
status: <query running>
query: select * from t_10m, t_1m where t_10m.md5 like '%cb%';
time used (s): 10
Nested Loop
-> Seq Scan on t_1m => rows 7/1000000 0% => blks 8334/8334 100%
-> Materialize => file read readptrcount=1 rows write=1189285
read=6584854 disk use (bytes) 53842965
-> Seq Scan on t_10m => rows 1145596/738172 155%
Filter: (md5 ~~ '%cb%'::text)
total disk space used (MB) 51
(9 rows)

=> Terminal 2
test=# PROGRESS (FORMAT JSON) 9546;
PLAN PROGRESS
----------------------------------------------------------------------
[ +
"status": "<query running>", +
"query": "select * from t_10m, t_1m where t_10m.md5 like '%cb%';",+
"time used (s)": 0, +
"single worker": { +
"Node Type": "Nested Loop", +
"Partial Mode": "", +
"Operation": "single worker", +
"Parent Relationship": "single worker", +
"Custom Plan Provider": "(@\u0004\u0001", +
"Parallel Aware": false, +
"Outer": { +
"Node Type": "Seq Scan", +
"Strategy": "", +
"Partial Mode": "single worker", +
"Operation": "Outer", +
"Parent Relationship": "Outer", +
"Custom Plan Provider": "(@\u0004\u0001", +
"Parallel Aware": false, +
"relation": "t_1m", +
"rows fetched": 1, +
"rows total": 1000000, +
"rows percent": 0, +
"blocks fetched": 8334, +
"blocks total": 8334, +
"blocks percent": 100 +
}, +
"Inner": { +
"Node Type": "Materialize", +
"Strategy": "", +
"Partial Mode": "single worker", +
"Operation": "Inner", +
"Parent Relationship": "Inner", +
"Custom Plan Provider": "(@\u0004\u0001", +
"Parallel Aware": false, +
"file store": "write", +
"readptrcount": 1, +
"rows write": 297256, +
"rows read": 0, +
"disk use (bytes)": 11911168, +
"Outer": { +
"Node Type": "Seq Scan", +
"Strategy": "", +
"Partial Mode": "Inner", +
"Operation": "Outer", +
"Parent Relationship": "Outer", +
"Custom Plan Provider": "HtFH\b[]\u000f\u001f", +
"Parallel Aware": false, +
"relation": "t_10m", +
"rows fetched": 253566, +
"rows total": 738172, +
"rows percent": 34, +
"blocks fetched": 18436, +
"blocks total": 83334, +
"blocks percent": 22, +
"Filter": "(md5 ~~ '%cb%'::text)" +
} +
} +
}, +
"unit": "MB", +
"total disk space used": 11 +
]
(1 row)

test=#

I'am skeptical about the use of JSON, XML, and others in such output.

Does anyone use these formats (XML, JSON, YAML) for EXPLAIN output?
I suspect only TEXT format is being used.

This looks better with a SQL table made of name/value fields to be used
both by administrators and by utilities.
And this is much more inline with SQL. No need to parse JSON, XML or any
other format. Output can be controlled with WHERE clause.

Rgds
Remi

2017-07-31 12:10 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2017-07-31 11:09 GMT+02:00 Remi Colinet <remi(dot)colinet(at)gmail(dot)com>:
>
>>
>>
>> 2017-07-26 15:27 GMT+02:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>
>>> On Wed, Jun 21, 2017 at 10:01 AM, Remi Colinet <remi(dot)colinet(at)gmail(dot)com>
>>> wrote:
>>> > test=# SELECT pid, ppid, bid, concat(repeat(' ', 3 * indent),name),
>>> value,
>>> > unit FROM pg_progress(0,0);
>>> > pid | ppid | bid | concat | value | unit
>>> > -------+------+-----+------------------+------------------+---------
>>> > 14106 | 0 | 4 | status | query running |
>>> > 14106 | 0 | 4 | relationship | progression |
>>> > 14106 | 0 | 4 | node name | Sort |
>>> > 14106 | 0 | 4 | sort status | on tapes writing |
>>> > 14106 | 0 | 4 | completion | 0 | percent
>>> > 14106 | 0 | 4 | relationship | Outer |
>>> > 14106 | 0 | 4 | node name | Seq Scan |
>>> > 14106 | 0 | 4 | scan on | t_10m |
>>> > 14106 | 0 | 4 | fetched | 25049 | block
>>> > 14106 | 0 | 4 | total | 83334 | block
>>> > 14106 | 0 | 4 | completion | 30 | percent
>>> > (11 rows)
>>> >
>>> > test=#
>>>
>>> Somehow I imagined that the output would look more like what EXPLAIN
>>> produces.
>>>
>>
>>
>> I had initially used the same output as for the ANALYZE command:
>>
>> test=# PROGRESS 14611;
>> PLAN PROGRESS
>>
>> ------------------------------------------------------------
>> -----------------------------
>> Gather Merge
>> -> Sort=> dumping tuples to tapes
>> rows r/w merge 0/0 rows r/w effective 0/1464520 0%
>> Sort Key: md5
>> -> Parallel Seq Scan on t_10m => rows 1464520/4166700 35% blks
>> 36011/83334 43%
>> (5 rows)
>>
>> test=#
>>
>> But this restricts the use to "human consumers". Using a table output
>> with name/value pairs, allows the use by utilities for instance, without
>> parsing. This is less handy for administrators, but far better for 3rd
>> party utilities. One solution is otherwise to create a PL/SQL command on
>> top of pg_progress() SQL function to produce an output similar to the one
>> of the ANALYZE command.
>>
>
> you can support XML, JSON output format like EXPLAIN does.
>
> https://www.postgresql.org/docs/current/static/sql-explain.html
>
> Regards
>
> pavel
>
>>
>>
>>> > If the one shared memory page is not enough for the whole progress
>>> report,
>>> > the progress report transfert between the 2 backends is done with a
>>> series
>>> > of request/response. Before setting the latch, the monitored backend
>>> write
>>> > the size of the data dumped in shared memory and set a status to
>>> indicate
>>> > that more data is to be sent through the shared memory page. The
>>> monitoring
>>> > backends get the result and sends an other signal, and then wait for
>>> the
>>> > latch again. The monitored backend does not collect a new progress
>>> report
>>> > but continues to dump the already collected report. And the exchange
>>> goes on
>>> > until the full progress report has been dumped.
>>>
>>> This is basically what shm_mq does. We probably don't want to
>>> reinvent that code, as it has taken a surprising amount of debugging
>>> to get it fully working.
>>>
>>
>> Yes, I had once considered this solution but then moved away as I was
>> unsure of the exact need for the transfert of the progress report between
>> the monitored and the monitoring backends.
>> I'am going to switch to shm_mq.
>>
>> Thx & Rgds
>>
>>
>>
>>>
>>> --
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-01 16:45:53 Re: AlterUserStmt anmd RoleSpec rules in grammar.y
Previous Message Amit Khandekar 2017-08-01 16:33:40 Re: map_partition_varattnos() and whole-row vars