Quick Links

Re: Incorrect handling of OOM in WAL replay leading to data loss

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, ethmertz(at)amazon(dot)com, nathandbossart(at)gmail(dot)com, pgsql(at)j-davis(dot)com, sawada(dot)mshk(at)gmail(dot)com
Subject:	Re: Incorrect handling of OOM in WAL replay leading to data loss
Date:	2023-08-09 07:35:09
Message-ID:	ZNNBrS0BumpGCJBd@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 09, 2023 at 04:13:53PM +0900, Kyotaro Horiguchi wrote:
> I'm not certain if message_deferred is a property of the error
> struct. Callers don't seem to need that information.

True enough, will remove.

> The name "XLOG_RADER_NONE" seems too generic. XLOG_READER_NOERROR will
> be clearer.

Or perhaps just XLOG_READER_NO_ERROR?

> 0002 shifts the behavior for the OOM case from ending recovery to
> retrying at the same record. If the last record is really corrupted,
> the server won't be able to finish recovery. I doubt we are good with
> this behavior change.

You mean on an incorrect xl_tot_len? Yes that could be possible.
Another possibility would be a retry logic with an hardcoded number of
attempts and a delay between each. Once the infrastructure is in
place, this still deserves more discussions but we can be flexible.
The immediate FATAL is choice.
--
Michael

In response to

Re: Incorrect handling of OOM in WAL replay leading to data loss at 2023-08-09 07:13:53 from Kyotaro Horiguchi

Responses

Re: Incorrect handling of OOM in WAL replay leading to data loss at 2023-08-09 08:00:49 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2023-08-09 07:46:03	Re: Extract numeric [field] in JSONB more effectively
Previous Message	Juan José Santamaría Flecha	2023-08-09 07:22:45	Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?