Long running query: How to monitor the progress

From: Johann Spies <johann(dot)spies(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Long running query: How to monitor the progress
Date: 2015-08-25 07:22:45
Message-ID: CAGZ55DTvuGvVkcQO4vgNrbQOkFZedGy+5Y5A1qFB3PT5HN2GiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Working with 9.4.

We are in the process of unpacking complicated XML-data into tables.
XML-data are already in a table with two fields (id, xml) - 47+ million
records.

Some of hour queries to extract the data and insert it in other tables runs
for days and in one case we have created a table with 758million unique
records.

Now my question. Is there a way to monitor the progress of a long running
query like this?

I have recently read that it is probably better for processes like this to
copy result of the query to a csv-file and then import it again with copy
as an insert. Next time I will try that.

The following query has been running for 6 days now and are still running
(I have anonymized it a little bit) on a server with 768 GB RAM. It has
created 44 temporary files so far:

INSERT INTO table_a_link(uid,gn_id)

WITH p AS
(SELECT ARRAY[ARRAY['t','some_xpath']] AS some_xpath),
q AS
(SELECT (xpath('//t:UID/text()',xml,some_xpath))[1] uid,
unnest(xpath('//t:grant',xml,some_xpath)) AS gr
FROM source.xml_data a,
p
WHERE xpath_exists('//t:grant/t:grant_agency', xml ,some_xpath)),
r AS
(
SELECT
CASE WHEN
xpath_exists('//t:grant_ids', gr, some_xpath)
THEN
unnest(xpath('//t:grant_ids', gr, some_xpath))
ELSE
NULL
END
AS GR_ids
FROM q,
p ) ,
y as (SELECT A.UUID AS FO_ID,
/* unnest(xpath('//t:grant_agency/text()',GR,ns))::citext AS agency,
*/ CASE WHEN
xpath_exists('//t:grant_id', gr_ids, some_xpath)
THEN
unnest(xpath('//t:grant_id/text()', gr_ids, some_xpath))::citext
ELSE
NULL
END
grant_NO,
uid::varchar(19)
from WOS.FUNDING_ORG A, p,q
left join r on (xpath('//t:grant/t:grant_ids/t:grant_id/text()',gr,
ARRAY[ARRAY['t','some_xpath']])::citext =

xpath('//t:grant_id/text()',GR_IDS,ARRAY[ARRAY['t','some_xpath']])::citext)
WHERE A.FUNDING_ORG =
(xpath('//t:grant_agency/text()',GR,some_xpath))[1]::CITEXT
)

select distinct y.uid, B.uuid gn_id

from y, table_b B
where
y.fo_id = B.fo_id
and
y.grant_no is not distinct from b.grant_no

Regards.
Johann

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Henrik Thostrup Jensen 2015-08-25 13:11:43 Gist indexing performance with cidr types
Previous Message Jeff Janes 2015-08-24 17:04:42 Re: query not using GIN index