Re: [Proposal] Progress bar for pg_dump/pg_restore

From: Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
To: "mmoncure(at)gmail(dot)com" <mmoncure(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>
Subject: Re: [Proposal] Progress bar for pg_dump/pg_restore
Date: 2015-06-24 10:48:14
Message-ID: 12A9442FBAE80D4E8953883E0B84E0885728A1@BPXM01GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Merlin.

Thank you for your comment, and sorry for late response.

> *) how do you estimate %done and ETA when dumping?

I mentioned in the mail I replied to Andres, I think %done and ETA can be estimated from number of tuples in "pg_class.reltuples".
Pg_dump, you maybe know, writes in file whenever it reads one tuple when executing "COPY FROM".
Therefore pg_dump can calculate %done and ETA by getting "pg_class.reltuples" and measuring number of dumped tuples per second.

> *) what's the benefit of doing this instead of using a utility like 'pv'?

Thank you for giving new point of view. I have never known about the utility 'pv'. :)
I tried pg_dump with pv, and then I found this approach uses the number of how many chars passed through the pipe.
In my point of view, it seems that using 'pv' has some problems as following.
At least, I think the following points from No.1 to No.4 are benefits.

1) %done and ETA is calculated from number of chars passed through the pipe (mentioned above), and total amount of chars is specified by "hand".
Therefore, if specified total amount is completely wrong, %done and ETA have a large gap from their true value.
2) Since 'pv' is used with pipe processing, pg_dump/pg_restore can't be used together with '-j' option.
This forces pg_dump/pg_restore to be processing with only 1 process even if processing with 2+ processes is possible.
3) Since same reason, command line for pg_dump/pg_restore is longer and less easier.
This may spoil user experiences.
4) To pass data through pipe, pg_dump can't be used together with '-f' option, and pg_restore also can't be used together with '-d' option.
This also may spoil user experiences because command line is longer and less easier.
5) Neither this approach nor my proposal resolve the concern about "CREATE INDEX".
We have to discuss more further for it.

regards,
--
Taiki Kondo

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure(at)gmail(dot)com]
Sent: Friday, June 12, 2015 10:42 PM
To: Taiki Kondo
Cc: pgsql-hackers(at)postgresql(dot)org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com> wrote:
> Hi, all.
>
> I am newbie in hackers.
> I have an idea from my point of view as one user, I would like to propose the following.
>
>
> Progress bar for pg_dump / pg_restore
> =====================================
>
> Motivation
> ----------
> "pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) option.
> In too large table to finish in a few minutes, this behavior worries some users about if this situation (nothing shows up) is all right.
>
> I propose this feature to free these users from worrying.
>
>
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.
>
>
> =========> (50%) 15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in above) shows estimated time to finish.
> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I would like to add a new element to "Archive" structure.
> (This means that version number of archive format is changed.)
>
>
> Usage
> ------
> To use this feature, user must specify "-P" option in command line.
> (This definition is also temporary, so this is changeable if this leads problem.)
>
> $ pg_dump -Fc -P -f foo.pgdump foo
>
> I also think it's better that this feature is enabled as the default and does not force users to specify any options, but it means changing the default behavior, and can make problem in some programs expecting no output on stdout.
>
>
> I will implement this feature if this proposal is accepted by hackers.
> (Maybe, I will not use ncurses for implementing this feature, because ncurses can not be used with standard printf family functions.)
>
>
> Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Uriy Zhuravlev 2015-06-24 11:30:21 Re: WIP: Enhanced ALTER OPERATOR
Previous Message Andres Freund 2015-06-24 09:34:27 Re: 9.5 release notes