Re: [HACKERS] CLUSTER command progress monitor

From: Tatsuro Yamada <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-03-06 06:38:54
Message-ID: 03cc5c0e-243c-e4a0-c5cf-a1f8380ca530@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019/03/05 17:56, Tatsuro Yamada wrote:
> Hi Robert!
>
> On 2019/03/05 11:35, Robert Haas wrote:
>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>> <yamada(dot)tatsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> === Current design ===
>>>
>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>>> Depending on which one is chosen, the command will proceed in the
>>> following sequence of phases:
>>>
>>>     * Scan method: Seq Scan
>>>       0. initializing                 (*2)
>>>       1. seq scanning heap            (*1)
>>>       3. sorting tuples               (*2)
>>>       4. writing new heap             (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>>     * Scan method: Index Scan
>>>       0. initializing                 (*2)
>>>       2. index scanning heap          (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> VACUUM FULL command will proceed in the following sequence of phases:
>>>
>>>       1. seq scanning heap            (*1)
>>>       5. swapping relation files      (*2)
>>>       6. rebuilding index             (*2)
>>>       7. performing final cleanup     (*2)
>>>
>>> (*1): increasing the value in heap_tuples_scanned column
>>> (*2): only shows the phase in the phase column
>>
>> All of that sounds good.
>>
>>> The view provides the information of CLUSTER command progress details as follows
>>> # \d pg_stat_progress_cluster
>>>                 View "pg_catalog.pg_stat_progress_cluster"
>>>             Column           |  Type   | Collation | Nullable | Default
>>> ---------------------------+---------+-----------+----------+---------
>>>    pid                       | integer |           |          |
>>>    datid                     | oid     |           |          |
>>>    datname                   | name    |           |          |
>>>    relid                     | oid     |           |          |
>>>    command                   | text    |           |          |
>>>    phase                     | text    |           |          |
>>>    cluster_index_relid       | bigint  |           |          |
>>>    heap_tuples_scanned       | bigint  |           |          |
>>>    heap_tuples_vacuumed      | bigint  |           |          |
>>
>> Still not sure if we need heap_tuples_vacuumed.  We could try to
>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
>> we're using a Seq Scan.
>
> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
> next patch.
>
> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
> get those from initscan(). I'll investigate it more.
>
> cluster.c
>   copy_heap_data()
>     heap_beginscan()
>       heap_beginscan_internal()
>         initscan()
>
>
>
>>> === Discussion points ===
>>>
>>>    - Progress counter for "3. sorting tuples" phase
>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like a
>>>         "trace_sort"?
>>>         Thanks to Peter Geoghegan for the useful advice!
>>
>> How would we avoid an abstraction violation?
>
> Hmm... What do you mean an abstraction violation?
> If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples.
>
>
>>>    - Progress counter for "6. rebuilding index" phase
>>>       - Should we add "index_vacuum_count" in the view like a vacuum progress monitor?
>>>         If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c.
>>>         However, I'm not sure whether it is okay or not.
>>
>> Doesn't seem unreasonable to me.
>
> I see, I'll add it later.

Attached file is revised and WIP patch including:

- Remove heap_tuples_vacuumed
- Add heap_blks_scanned and heap_blks_total
- Add index_vacuum_count

I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that
"heap_tuples_scanned" column is suitable as a counter when a scan method is
both index-scan and seq-scan because CLUSTER is on a tuple basis.

Regards,
Tatsuro Yamada

Attachment Content-Type Size
progress_monitor_for_cluster_command_v8_code.patch text/x-patch 13.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-06 06:45:06 pg_dump is broken for partition tablespaces
Previous Message Amit Langote 2019-03-06 06:34:12 Re: Update does not move row across foreign partitions in v11