From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PROPOSAL] VACUUM Progress Checker. |
Date: | 2015-07-22 07:02:02 |
Message-ID: | CANP8+j+NbHL70=ppm=9gkw56QDSAF=cc+Jk_iO0CBUQ_bUMjGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21 July 2015 at 21:24, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > Yes, I suggest just a single column on pg_stat_activity called
> pct_complete
> >
> > trace_completion_interval = 5s (default)
> >
> > Every interval, we report the current % complete for any operation that
> > supports it. We just show NULL if the current operation has not reported
> > anything or never will.
>
> I am deeply skeptical about the usefulness of a progress-reporting
> system that can only report one number. I think that, in many cases,
> it won't be possible to compute an accurate completion percentage, and
> even if it is, people may want more data than that for various reasons
> anyway.
>
The goal here was to have a common metric for all tasks. Multiple numbers
are OK for me, but not extended detail (yet!)
As I said later:
Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I'm interested in seeing a report that relates to actual progress made.
> Predicted total work required is also interesting, but is much less
trustworthy figure.
I'm aware of the difficulties for VACUUM in particular and agree with your
scenario/details.
That does not deter me from wishing to see high level information, even it
varies or is inaccurate. The "arrival time" on my Sat Nav is still useful,
even if it changes because of traffic jams that develop while my journey is
in progress. If the value bothers me, I look at the detail. So both summary
and detail information are useful, but summary is more important even
though it is less accurate.
Instead of getting told "we're X% done" (according to some arcane
> formula), it's quite reasonable to think that people will want to get
> a bunch of values, e.g.:
>
> 1. For what percentage of heap pages have we completed phase one (HOT
> prune + mark all visible if appropriate + freeze + remember dead
> tuples)?
> 2. For what percentage of heap pages have we completed phase two (mark
> line pointers unused)?
> 3. What percentage of maintenance_work_mem are we currently using to
> remember tuples?
>
> For a query, the information we want back is likely to be even more
> complicated; e.g. EXPLAIN output with row counts and perhaps timings
> to date for each plan node. We can insist that all status reporting
> get boiled down to one number, but I suspect we would be better off
> asking ourselves how we could let commands return a richer set of
> data.
I agree that it is desirable to have a more detailed breakdown of what is
happening. As soon as we do that we hit the need for very action-specific
information reporting, which renders the topic much harder and much more
specific.
For me, the user workflow looks like these....
Worried: "Task X is taking ages? When is it expected to finish?"
Ops: 13:50
<sometime later, about 14:00>
Worried: "Task X is still running? But I thought its ETA was 13:50?"
Ops: Now says 14:30
Worried: "Is it stuck, or is it making progress?"
Ops: Looks like its making progress
Worried: "Can we have a look at it and find out what its doing?"
Worried: "When will Task Y finish?"
Ops: Monday at 11am
Worried: "Bad news! We should cancel it on Sunday evening."
The point is that nobody looks at the detailed info until we have looked at
the summary. So the summary of progress/completion time is important, even
if it is possibly wrong. The detail is also useful. I think we should have
both, but I'd like to see the summary info first, because it is the most
useful, best leading indicator of problems.
In terms of VACUUM specifically: VACUUM should be able to assess beforehand
whether it will scan the indexes, or it can just assume that it will need
to scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
task it has before it starts.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2015-07-22 07:10:17 | Re: Asynchronous execution on FDW |
Previous Message | Etsuro Fujita | 2015-07-22 06:50:02 | Re: Alpha2/Beta1 |