Re: how to monitor the progress of really large bulk operations?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mike Sofen <msofen(at)runbox(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: how to monitor the progress of really large bulk operations?
Date: 2016-09-28 04:17:39
Message-ID: CAFj8pRBP+qacndYRa9P5str2KaydtSXDbcMBCzb2m-N1FsMQQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2016-09-28 6:13 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2016-09-27 23:03 GMT+02:00 Mike Sofen <msofen(at)runbox(dot)com>:
>
>> Hi gang,
>>
>>
>>
>> On PG 9.5.1, linux, I’m running some large ETL operations, migrate data
>> from a legacy mysql system into PG, upwards of 250m rows in a transaction
>> (it’s on a big box). It’s always a 2 step operation – extract raw mysql
>> data and pull it to the target big box into staging tables that match the
>> source, the second step being read the landed dataset and transform it into
>> the final formats, linking to newly generated ids, compressing big subsets
>> into jsonb documents, etc.
>>
>>
>>
>> While I could break it into smaller chunks, it hasn’t been necessary, and
>> it doesn’t eliminate my need: how to view the state of a transaction in
>> flight, seeing how many rows have been read or inserted (possible for a
>> transaction in flight?), memory allocations across the various PG
>> processes, etc.
>>
>>
>>
>> Possible or a hallucination?
>>
>>
>>
>> Mike Sofen (Synthetic Genomics)
>>
>
> some years ago I used a trick http://okbob.blogspot.cz/2014/
> 09/nice-unix-filter-pv.html#links
>

pltoolbox has counter function
https://github.com/okbob/pltoolbox/blob/master/utils.c

pavel=# insert into omega2 select (x.xx).*
from (select pst.counter(omega,200000, true) xx
from omega
) x;
NOTICE: processed 200000 rows, current value is '(5,8)'
NOTICE: processed 200000 rows, current value is '(5,8)'
NOTICE: processed 400000 rows, current value is '(6,8)'
NOTICE: processed 400000 rows, current value is '(6,8)'
NOTICE: processed 600000 rows, current value is '(7,8)'
NOTICE: processed 600000 rows, current value is '(7,8)'
NOTICE: processed 800000 rows, current value is '(1,8)'
NOTICE: processed 800000 rows, current value is '(1,8)'
NOTICE: processed 1000000 rows, current value is '(5,8)'
NOTICE: processed 1000000 rows, current value is '(5,8)'

>
>
> Regards
>
> Pavel
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adir Shaban 2016-09-28 08:09:02 Database fixed size
Previous Message Pavel Stehule 2016-09-28 04:13:48 Re: how to monitor the progress of really large bulk operations?