| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: WAL recycle retading based on active sync rep. | 
| Date: | 2016-11-18 18:16:22 | 
| Message-ID: | 20161118181622.hklschaizwaxocl7@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2016-11-18 14:12:42 +0900, Kyotaro HORIGUCHI wrote:
> We had too-early WAL recycling during a test we had on a sync
> replication set. This is not a bug and a bit extreme case but is
> contrary to expectation on synchronous replication.
I don't think you can expect anything else.
> This is because sync replication doesn't wait non-commit WALs to
> be replicated. This situation is artificially caused with the
> first patch attached and the following steps.
You could get that situation even if we waited for syncrep. The
SyncRepWaitForLSN happens after delayChkpt is unset.
Additionally a syncrep connection could break for a a short while, and
you'd loose all guarantees anyway.
> - Is this situation required to be saved? This is caused by a
>   large transaction, spans over two max_wal_size segments, or
>   replication stall lasts for a chackepoint period.
I very strongly think not.
> - Is the measure acceptable?  For the worst case, a master
>   crashes from WAL space exhaustion. (But such large transaction
>   won't/shouldn't exist?)
No, imo not.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Nasby | 2016-11-18 18:22:58 | Re: JIT compiler for expressions | 
| Previous Message | David Steele | 2016-11-18 17:38:14 | Re: Fix checkpoint skip logic on idle systems by tracking LSN progress |