From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | noah(at)leadboat(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz |
Subject: | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date: | 2019-05-14 04:59:10 |
Message-ID: | 20190514.135910.258194307.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
At Sun, 12 May 2019 17:37:05 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190513003705(dot)GA1202614(at)rfd(dot)leadboat(dot)com>
> On Sun, Mar 31, 2019 at 03:31:58PM -0700, Noah Misch wrote:
> > On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > > I also liked the design in the https://postgr.es/m/559FA0BA.3080808@iki.fi
> > > last paragraph, and I suspect it would have been no harder to back-patch. I
> > > wonder if it would have been simpler and better, but I'm not asking anyone to
> > > investigate that.
> >
> > Now I am asking for that. Would anyone like to try implementing that other
> > design, to see how much simpler it would be?
Yeah, I think it is a bit too-complex for the value. But I think
it is the best way as far as we keep reusing a file on
truncation of the whole file.
> Anyone? I've been deferring review of v10 and v11 in hopes of seeing the
> above-described patch first.
The siginificant portion of the complexity in this patch comes
from need to behave differently per block according to remebered
logged and truncated block numbers.
0005:
+ * NB: after WAL-logging has been skipped for a block, we must not WAL-log
+ * any subsequent actions on the same block either. Replaying the WAL record
+ * of the subsequent action might fail otherwise, as the "before" state of
+ * the block might not match, as the earlier actions were not WAL-logged.
+ * Likewise, after we have WAL-logged an operation for a block, we must
+ * WAL-log any subsequent operations on the same page as well. Replaying
+ * a possible full-page-image from the earlier WAL record would otherwise
+ * revert the page to the old state, even if we sync the relation at end
+ * of transaction.
+ *
+ * If a relation is truncated (without creating a new relfilenode), and we
+ * emit a WAL record of the truncation, we can't skip WAL-logging for any
+ * of the truncated blocks anymore, as replaying the truncation record will
+ * destroy all the data inserted after that. But if we have already decided
+ * to skip WAL-logging changes to a relation, and the relation is truncated,
+ * we don't need to WAL-log the truncation either.
If this consideration holds and given the optimizations on
WAL-skip and truncation, there's no way to avoid the per-block
behavior as far as we are allowing mixture of
logged-modifications and WAL-skipped COPY on the same relation
within a transaction.
We could avoid the per-block behavior change by making the
wal-inhibition per-relation basis. That will reduce the patch
size by the amount of BufferNeedsWALs and log_heap_update, but
not that large.
inhibit wal-skipping after any wal-logged modifications in the relation.
inhibit wal-logging after any wal-skipped modifications in the relation.
wal-skipped relations are synced at commit-time.
truncation of wal-skipped relation creates a new relfilenode.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-05-14 05:22:15 | Re: [HACKERS] Unlogged tables cleanup |
Previous Message | Andres Freund | 2019-05-14 04:33:52 | Re: [HACKERS] Unlogged tables cleanup |