From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: archive status ".ready" files may be created too early |
Date: | 2021-03-15 16:34:29 |
Message-ID: | E63E5670-6CC3-4B09-9686-A77CF94FE4A8@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/18/21, 4:10 PM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:
> Alright, I've attached a new patch set for this.
>
> 0001 is similar to the last patch I sent in this thread, although it
> contains a few fixes. The main difference is that we no longer
> initialize lastNotifiedSeg in StartupXLOG(). Instead, we initialize
> it in XLogWrite() where we previously were creating the archive status
> files. This ensures that standby servers do not create many
> unnecessary archive status files after promotion.
>
> 0002 adds logic for persisting the last notified segment through
> crashes. This is needed because a poorly-timed crash could otherwise
> cause us to skip marking segments as ready-for-archival altogether.
> This file is only used for primary servers, as there exists a separate
> code path for marking segments as ready-for-archive for standbys.
>
> I considered attempting to prevent this bug from affecting standby
> servers by withholding WAL for a segment until the previous segment
> has been marked ready-for-archival. However, that would require us to
> track record boundaries even with archiving turned off. Also, my
> patch relied on the assumption that the flush pointer advances along
> record boundaries except for records that span multiple segments.
> This assumption is likely not always true, and even if it is, it seems
> pretty fragile. Furthermore, I suspect that there are still problems
> with standbys since the code path responsible for creating archive
> status files on standbys has even less context about the WAL record
> boundaries. IMO patches 0001 and 0002 should be the focus for now,
> and related bugs for standby servers should be picked up in a new
> thread.
>
> I ended up not touching archive_timeout at all. The documentation for
> this parameter seems to be written ambiguously enough such that any
> small differences in behavior with these patches is still acceptable.
> I don't expect that users will see much change. In the worst case,
> the timer for archive_timeout may get reset a bit before the segment's
> archive status file is created.
I've attached a set of rebased patches.
Nathan
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Avoid-creating-archive-status-.ready-files-too-ea.patch | application/octet-stream | 13.7 KB |
v2-0002-Keep-track-of-notified-ready-for-archive-position.patch | application/octet-stream | 10.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Oh, Mike | 2021-03-15 16:34:56 | [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Previous Message | Mark Dilger | 2021-03-15 16:30:43 | Re: REINDEX backend filtering |