Re: Progress reporting for pg_verify_checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, alvherre(at)2ndquadrant(dot)com, mailings(at)oopsware(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress reporting for pg_verify_checksums
Date: 2019-03-30 20:07:41
Message-ID: 1553976461.4884.66.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

so we are basically back at the original patch as written by Bernd :)

> +/*
> + * Report current progress status. Parts borrowed from
> + * PostgreSQLs' src/bin/pg_basebackup.c
> + */
> +static void
> +progress_report(bool force)
> +{
> +     int                     percent;
> +     char            total_size_str[32];
> +     char            current_size_str[32];
> +     pg_time_t       now;
> +
> +     if (!showprogress)
> +             return;

The way you've now changed this is that there's a function call into
progress_report() for every block that's being read, even if there is no
progress reporting requested. That looks like a pretty severe
performance problem so I suggest to at least stick with checking
showprogress before calling progress_report() and not the other way
round.

I guess there is still some noticeable performance penalty to pay when
reporting progress, even more so now that you reverted to only doing it
once per second - if we report progress, we presumably call
report_progress() thousands of times and return early 99,9% of the time.

So my vote is in favour of only calling progress_report() every once in
a while - I saw quite a speedup (or removal of slowdown) due to this in
my tests, this was not just some unwarranted microoptimization.

> +     fprintf(stderr, "\r");

I think the isatty() check that was in our versions of the patch is
useful here, did you check how this looks when piping the output to a
file?

> @@ -195,12 +254,23 @@ scan_file(const char *fn, BlockNumber segmentno)
>                                       _("%s: checksums enabled in file
\"%s\"\n"), progname, fn);
>       }
>  
> +     /* Make sure progress is reported at least once per file */
> +     progress_report(false);
> +
>       close(f);
>  }

This hunk is from the performance optimization of calling
progress_report only every 1024 blocks and could be removed if we stick
with calling it for every block anyway (but see above).

> -static void
> -scan_directory(const char *basedir, const char *subdir)
> +/*
> + * Scan the given directory for items which can be checksummed and
> + * operate on each one of them.  If "sizeonly" is true, the size of
> + * all the items which have checksums is computed and returned back

"computed" is maybe a bit strong a word here, how about "added up"?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-03-30 20:13:01 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Tomas Vondra 2019-03-30 19:25:43 Re: Enable data checksums by default