Re: [HACKERS] CLUSTER command progress monitor

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2017-11-21 20:57:19
Message-ID: CA+Tgmob00ASAYZUvtCmMY45LfO3E2D-re59DEOHY7Lf1KLHXiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > 1. scanning heap
>> > 2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard. In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read. Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes. I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

I think it is far too simplistic. If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess. But that's the *scan* progress, not the *sort*
progress.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-11-21 21:01:33 Re: [HACKERS] [PATCH] Incremental sort
Previous Message Robert Haas 2017-11-21 20:55:23 Re: [HACKERS] CLUSTER command progress monitor