Re: ANALYZE command progress checker

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-03-18 12:00:36
Message-ID: CAE9k0PmGhP4+Sa219AzwOghhQFerZirysDpp2p6Hem24Rcg+ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vinayak,

Here are couple of review comments that may need your attention.

1. Firstly, I am seeing some trailing whitespace errors when trying to
apply your v3 patch using git apply command.

[ashu(at)localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
CREATE VIEW pg_stat_progress_analyze AS
pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
SELECT
pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
S.pid AS pid, S.datid AS datid, D.datname AS datname,
pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
S.relid AS relid,
pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
CASE S.param1 WHEN 0 THEN 'initializing'
error: patch failed: doc/src/sgml/monitoring.sgml:521

2) The test-case 'rules' is failing. I think it's failing because in
rules.sql 'ORDERBY viewname' is used which will put
'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
list of catalog views. You may need to correct rules.out file
accordingly. This is what i could see in rules.sql,

SELECT viewname, definition FROM pg_views WHERE schemaname <>
'information_schema' ORDER BY viewname;

I am still reviewing your patch and doing some testing. Will update if
i find any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Fri, Mar 17, 2017 at 3:16 PM, vinayak
<Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> On 2017/03/17 10:38, Robert Haas wrote:
>>
>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>> <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>
>>> Thank you for reviewing the patch.
>>>
>>> The attached patch incorporated Michael and Amit comments also.
>>
>> I reviewed this tonight.
>
> Thank you for reviewing the patch.
>>
>> + /* Report compute index stats phase */
>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>
>> Hmm, is there really any point to this? And is it even accurate? It
>> doesn't look to me like we are computing any index statistics here;
>> we're only allocating a few in-memory data structures here, I think,
>> which is not a statistics computation and probably doesn't take any
>> significant time.
>>
>> + /* Report compute heap stats phase */
>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>
>> The phase you label as computing heap statistics also includes the
>> computation of index statistics. I wouldn't bother separating the
>> computation of heap and index statistics; I'd just have a "compute
>> statistics" phase that begins right after the comment that starts
>> "Compute the statistics."
>
> Understood. Fixed in the attached patch.
>>
>>
>> + /* Report that we are now doing index cleanup */
>> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>
>> OK, this doesn't make any sense either. We are not cleaning up the
>> indexes here. We are just closing them and releasing resources. I
>> don't see why you need this phase at all; it can't last more than some
>> tiny fraction of a second. It seems like you're trying to copy the
>> exact same phases that exist for vacuum instead of thinking about what
>> makes sense for ANALYZE.
>
> Understood. I have removed this phase.
>>
>>
>> + /* Report number of heap blocks scaned so far*/
>> + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
>> targblock);
>>
>> While I don't think this is necessarily a bad thing to report, I find
>> it very surprising that you're not reporting what seems to me like the
>> most useful thing here - namely, the number of blocks that will be
>> sampled in total and the number of those that you've selected.
>> Instead, you're just reporting the length of the relation and the
>> last-sampled block, which is a less-accurate guide to total progress.
>> There are enough counters to report both things, so maybe we should.
>>
>> + /* Report total number of sample rows*/
>> + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS,
>> numrows);
>>
>> As an alternative to the previous paragraph, yet another thing you
>> could report is number of rows sampled out of total rows to be
>> sampled. But this isn't the way to do it: you are reporting the
>> number of rows you sampled only once you've finished sampling. The
>> point of progress reporting is to report progress -- that is, to
>> report this value as it's updated, not just to report it when ANALYZE
>> is practically done and the value has reached its maximum.
>
> Understood.
> I have reported number of rows sampled out of total rows to be sampled.
> I have not reported the number of blocks that will be sampled in total.
> So currect pg_stat_progress_analyze view looks like:
>
> =# \d+ pg_stat_progress_analyze
> View "pg_catalog.pg_stat_progress_analyze"
> Column | Type | Collation | Nullable | Default | Storage
> | Descripti
> on
> ------------------------+---------+-----------+----------+---------+----------+----------
> ---
> pid | integer | | | | plain
> |
> datid | oid | | | | plain
> |
> datname | name | | | | plain
> |
> relid | oid | | | | plain
> |
> phase | text | | | |
> extended |
> num_target_sample_rows | bigint | | | | plain
> |
> num_rows_sampled | bigint | | | | plain
> |
> View definition:
> SELECT s.pid,
> s.datid,
> d.datname,
> s.relid,
> CASE s.param1
> WHEN 0 THEN 'initializing'::text
> WHEN 1 THEN 'collecting sample rows'::text
> WHEN 2 THEN 'computing statistics'::text
> ELSE NULL::text
> END AS phase,
> s.param2 AS num_target_sample_rows,
> s.param3 AS num_rows_sampled
> FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid,
> param1, param2, p
> aram3, param4, param5, param6, param7, param8, param9, param10)
> LEFT JOIN pg_database d ON s.datid = d.oid;
>
> Comment?
>>
>> The documentation for this patch isn't very good, either. You forgot
>> to update the part of the documentation that says that VACUUM is the
>> only command that currently supports progress reporting, and the
>> descriptions of the phases and progress counters are brief and not
>> entirely accurate - e.g. heap_blks_scanned is not actually the number
>> of heap blocks scanned, because we are sampling, not reading all the
>> blocks.
>
> Understood. I have updated the documentation.
> I will also try to improve documentation.
>>
>> When adding calls to the progress reporting functions, please consider
>> whether a blank line should be added before or after the new code, or
>> both, or neither. I'd say some blank lines are needed in a few places
>> where you didn't add them.
>
> +1. I have added blank lines in a few places.
>
> Regards,
> Vinayak Pokale
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-03-18 12:28:55 Re: Logical replication existing data copy
Previous Message Robert Haas 2017-03-18 11:45:17 Re: wait events for disk I/O