From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | Re: Online checksums verification in the backend |
Date: | 2020-03-16 03:29:28 |
Message-ID: | 20200316030638.GA2331@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 11, 2020 at 08:18:23AM +0100, Julien Rouhaud wrote:
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.
+ * - if a block is not found in shared_buffers, the LWLock is relased and the
+ * block is read from disk without taking any lock. If an error is detected,
+ * the read block will be discarded and retrieved again while holding the
+ * LWLock. This is because an error due to concurrent write is possible but
+ * very unlikely, so it's better to have an optimistic approach to limit
+ * locking overhead
This can be risky with false positives, no? With a large amount of
shared buffer eviction you actually increase the risk of torn page
reads. Instead of a logic relying on partition mapping locks, which
could be unwise on performance grounds, did you consider different
approaches? For example a kind of pre-emptive lock on the page in
storage to prevent any shared buffer operation to happen while the
block is read from storage, that would act like a barrier.
+ * Vacuum's GUCs are used to avoid consuming too much resources while running
+ * this tool.
Shouldn't this involve separate GUCs instead of the VACUUM ones? I
guess that this leads to the fact that this function may be better as
a contrib module, with the addition of some better-suited APIs in core
(see paragraph above).
+Run
+ make check
+or
+ make installcheck
Why is installcheck mentioned here?
I don't think that it is appropriate to place the SQL-callable part in
the existing checksum.c. I would suggest instead a new file, say
checksumfuncs.c in src/backend/utils/adt/, holding any SQL functions
for checksums.
-SUBDIRS = perl regress isolation modules authentication recovery
subscription
+SUBDIRS = perl regress isolation modules authentication check_relation \
+ recovery subscription
It seems to me that this test would be a good fit for
src/test/modules/test_misc/.
+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+ ForkNumber forknum)
Per the argument of bloat, I think that I would remove
check_all_relation() as this function could take a very long time to
run, and just make the SQL function strict.
+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ * disk either before the end of the next checkpoint or during recovery in
+ * case of unsafe shutdown
Wouldn't it actually be a good thing to check that the page on storage
is fine in this case? This depends on the system settings and the
checkpoint frequency, but checkpoint_timeout can be extended up to 1
day. And plenty of things could happen to the storage in one day,
including a base backup that includes a corrupted page on storage,
that this function would not be able to detect.
+ * - if a block is otherwise found in shared_buffers, an IO lock is taken on
+ * the block and the block is then read from storage, ignoring the block in
+ * shared_buffers
Yeah, I think that you are right here to check the page on storage
anyway.
+ * we detect if a block is in shared_buffers or not. See get_buffer()
+ * comments for more details about the locking strategy.
get_buffer() does not exist in your patch, check_get_buffer() does.
+ * - if a block is not found in shared_buffers, the LWLock is relased and the
[...]
+ * To avoid torn page and possible false postives when reading data, and
Typos.
+ if (!DataChecksumsEnabled())
+ elog(ERROR, "Data checksums are not enabled");
Note that elog() is for the class of errors which are never expected,
and here a caller of pg_check_relation() with checksums disabled can
trigger that. So you need to call ereport() with
ERRCODE_FEATURE_NOT_SUPPORTED.
+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ * disk either before the end of the next checkpoint or during recovery in
+ * case of unsafe shutdown
Not sure that the indentation is going to react well on that part of
the patch, perhaps it would be better to add some "/*-------" at the
beginning and end of the comment block to tell pgindent to ignore this
part?
Based on the feedback gathered on this thread, I guess that you should
have a SRF returning the list of broken blocks, as well as NOTICE
messages. Another thing to consider is the addition of a range
argument to only check a certain portion of the blocks, say one
segment file at a time, etc. Fine by me to not include in the first
flavor of the patch.
The patch needs documentation.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2020-03-16 03:46:47 | Re: [HACKERS] WAL logging problem in 9.4.3? |
Previous Message | Masahiko Sawada | 2020-03-16 03:26:55 | Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager |