| From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [HACKERS] CLUSTER command progress monitor |
| Date: | 2017-11-21 20:57:19 |
| Message-ID: | CA+Tgmob00ASAYZUvtCmMY45LfO3E2D-re59DEOHY7Lf1KLHXiw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > 1. scanning heap
>> > 2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard. In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read. Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes. I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.
I think it is far too simplistic. If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess. But that's the *scan* progress, not the *sort*
progress.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2017-11-21 21:01:33 | Re: [HACKERS] [PATCH] Incremental sort |
| Previous Message | Robert Haas | 2017-11-21 20:55:23 | Re: [HACKERS] CLUSTER command progress monitor |