From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Michael Banck <michael(dot)banck(at)credativ(dot)de> |
Cc: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, alvherre(at)2ndquadrant(dot)com, mailings(at)oopsware(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Progress reporting for pg_verify_checksums |
Date: | 2019-04-01 06:10:41 |
Message-ID: | 20190401061041.GA1685@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote:
> The way you've now changed this is that there's a function call into
> progress_report() for every block that's being read, even if there is no
> progress reporting requested. That looks like a pretty severe
> performance problem so I suggest to at least stick with checking
> showprogress before calling progress_report() and not the other way
> round.
>
> So my vote is in favour of only calling progress_report() every once in
> a while - I saw quite a speedup (or removal of slowdown) due to this in
> my tests, this was not just some unwarranted microoptimization.
Do you have some runtime numbers? If all the pages are in the OS
cache you should be able to see more impact. I have been testing with
a pgbench database at scale 300 and --check, and perf is showing me
that progress_report counts for 0.10% with or without the option of
the profile while pg_checksum_page() counts for 36%. Switching the
switch in or out of it makes the performance change lost in the noise.
I'd rather keep the check for showprogress within progress_report() so
as extra callers don't miss bypassing the report if --progress is not
enabled, still we are talking about only one caller now so the
attached does it the other way around with an assertion.
>> + fprintf(stderr, "\r");
>
> I think the isatty() check that was in our versions of the patch is
> useful here, did you check how this looks when piping the output to a
> file?
Oops, fixed. The output was strange.
> This hunk is from the performance optimization of calling
> progress_report only every 1024 blocks and could be removed if we stick
> with calling it for every block anyway (but see above).
Indeed, removed this one.
>> -static void
>> -scan_directory(const char *basedir, const char *subdir)
>> +/*
>> + * Scan the given directory for items which can be checksummed and
>> + * operate on each one of them. If "sizeonly" is true, the size of
>> + * all the items which have checksums is computed and returned back
>
> "computed" is maybe a bit strong a word here, how about "added up"?
"computed" sounds fine to me.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
checksums-progress-report-v2.patch | text/x-diff | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-04-01 06:39:45 | Re: REINDEX CONCURRENTLY 2.0 |
Previous Message | Masahiko Sawada | 2019-04-01 05:26:15 | Re: New vacuum option to do only freezing |