From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Antonin Houska <ah(at)cybertec(dot)at> |
Cc: | Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] CLUSTER command progress monitor |
Date: | 2017-11-21 20:57:19 |
Message-ID: | CA+Tgmob00ASAYZUvtCmMY45LfO3E2D-re59DEOHY7Lf1KLHXiw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > 1. scanning heap
>> > 2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard. In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read. Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes. I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.
I think it is far too simplistic. If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess. But that's the *scan* progress, not the *sort*
progress.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-11-21 21:01:33 | Re: [HACKERS] [PATCH] Incremental sort |
Previous Message | Robert Haas | 2017-11-21 20:55:23 | Re: [HACKERS] CLUSTER command progress monitor |