From: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com> |
Subject: | Re: WAL consistency check facility |
Date: | 2016-09-13 09:07:09 |
Message-ID: | CAGz5QCKHWQCUrCbT4vjMgYPoOJJjoi2_dT=5eZVWeL=8nZT76Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>>> - If WAL consistency check is enabled for a rmgrID, we always include
>>>> the backup image in the WAL record.
>>>
>>>What happens if wal_consistency has different settings on a standby
>>>and its master? If for example it is set to 'all' on the standby, and
>>>'none' on the master, or vice-versa, how do things react? An update of
>>>this parameter should be WAL-logged, no?
>>
>> If wal_consistency is enabled for a rmid, standby will always check whether
>> backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
>> (I guess Amit and Robert also suggested the same in the thread)
>> Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
>> BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
>> image is required during redo. When we decode a wal record, has_image
>> flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.
>
>Ah I see. But do we actually store the status in the record itself,
>then at replay we don't care of the value of wal_consistency at
>replay. That's the same concept used by wal_compression. So shouldn't
>you have more specific checks related to that in checkConsistency? You
>actually don't need to check for anything in xlogreader.c, just check
>for the consistency if there is a need to do so, or do nothing.
I'm sorry, but I don't quite follow you here. If a wal record contains
an image, has_image should be set since it helps decoding the
record. But, during redo if XLogRecHasBlockImage() returns true, i.e.,
has_image is set, then it always restore the block. But, in our case,
a record can have a backup image which should not be restored. So, we need
to decide two things:
1. Does a record contain backup image? (required for decoding the record)
2. If it has an image, should we restore it during redo?
I think we sould decide these in DecodeXLogRecord() only. BKPBLOCK_HAS_IMAGE
answers the first question whereas BKPIMAGE_IS_REQUIRED_FOR_REDO
answers the second one. In DecodeXLogRecord(), we check that
BKPBLOCK_HAS_IMAGE should be set if wal_consistency is enabled for
this record. The flag has_image is set to
BKPIMAGE_IS_REQUIRED_FOR_REDO which is later used to decide whether we
want to restore a block or not.
> For now, I've kept this as a WARNING message to detect all inconsistencies
> at once. Once, the patch is finalized, I'll modify it as an ERROR message.
Or say FATAL. This way the server is taken down.
> Thoughts?
+1. I'll do that.
>A couple of extra thoughts:
>1) The routines of each rmgr are located in a dedicated file, for
>example GIN stuff is in ginxlog.c, etc. It seems to me that it would
>be better to move each masking function where it should be instead
>being centralized. A couple of routines need to be centralized, so I'd
>suggest putting them in a new file, like xlogmask.c, xlog because now
>this is part of WAL replay completely, including the lsn, the hint
>bint and the other common routines.
Sounds good. But, I think that the file name for common masking routines
should be as bufmask.c since we are masking the buffers only.
>2) Regarding page comparison:
>+/*
>+ * Compare the contents of two pages.
>+ * If the two pages are exactly same, it returns BLCKSZ. Otherwise,
>+ * it returns the location where the first mismatch has occurred.
>+ */
>+int
>+comparePages(char *page1, char *page2)
>We could just use memcpy() here. compareImages was useful to get a
>clear image of what the inconsistencies were, but you don't do that
>anymore.
memcmp(), right?
>5)
>+ has_image = record->blocks[block_id].has_image;
>+ record->blocks[block_id].has_image = true;
>+ if (!RestoreBlockImage(record, block_id, old_page))
>+ elog(ERROR, "failed to restore block image");
>+ record->blocks[block_id].has_image = has_image;
>Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO?
Sorry, I completely missed this.
>6)
>+ /*
>+ * Remember that, if WAL consistency check is enabled for
>the current rmid,
>+ * we always include backup image with the WAL record.
>But, during redo we
>+ * restore the backup block only if needs_backup is set.
>+ */
>+ if (needs_backup)
>+ bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
>This should use wal_consistency[rmid]?
needs_backup is set when XLogRecordAssemble decides that backup image
should be included in the record for redo purpose. This image will be
restored during
redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included
image should be restored during redo(or has_image should be set or not).
>7) This patch has zero documentation. Please add some. Any human being
>on this list other than those who worked on the first versions
>(Heikki, Simon and I?) is going to have a hard time to review this
>patch in details moving on if there is no reference to tell what this
>feature does for the user...
>
>This patch is going to the good direction, but I don't think it's far
>from being ready for commit yet. So I am going to mark it as returned
>with feedback if there are no objections
I think only major change that this patch needs a proper and detailed
documentation. Other than that there are very minor changes which can
be done quickly. Right?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2016-09-13 10:35:01 | Nested loop join condition does not get pushed down to foreign scan |
Previous Message | Heikki Linnakangas | 2016-09-13 08:44:01 | Re: Supporting SJIS as a database encoding |