From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | TipTop Labs <office(at)tiptop-labs(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14999: pg_rewind corrupts control file global/pg_control |
Date: | 2018-04-04 18:50:12 |
Message-ID: | 22961.1522867812@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Michael Paquier <michael(at)paquier(dot)xyz> writes:
> So after that I falled back to your patch and began testing it, which is
> where I noticed that we can *never* give the insurance to recover a data
> folder on which an error has happened in the middle of a pg_rewind. The
> reason for that is quite simple: even if the truncation has been moved
> down to the moment where the first chunk of a file is received, you may
> have already done work on some relation files. Particularly, some of
> them may have been truncated down to a given size without a new range of
> blocks fetched from the source. So the data folder would be in an
> inconsistent state if trying to rewind it again.
Yes, we certainly cannot guarantee that failure partway through pg_rewind
leaves a consistent state of the target data directory. It is likely
worth pointing that out in the documentation. Whether we can or should
do anything about it is a different question.
When I first started looking at this thread, I wondered if maybe somebody
had had in mind to create an active defense against starting a postmaster
in an inconsistent target cluster, by dint of intentionally truncating
pg_control before the transfer starts and not making it valid again till
the very end. It's now clear from looking at the code that that's not
what's going on :-(. But I wonder how hard it would be to make it so,
and whether that'd be worth doing if it's not too hard.
Actually, probably a safer way to attack that would be to remove or
rename the topmost PG_VERSION file, and then put it back afterwards.
That'd be far easier to recover from manually, if need be, than
clobbering pg_control.
In any case, that seems separate from the question of what to do with
read-only files in the data directory. Should we push forward with
committing Michael's previous patch, and leave that issue for later?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2018-04-04 23:41:48 | BUG #15143: Window Functions – Paranthese not allowed before OVER term |
Previous Message | Andrew Gierth | 2018-04-04 16:52:24 | Re: BUG #15142: ERROR: MultiXactId nnnnn has not been created yet -- apparent wraparound in v9.5 |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2018-04-04 18:53:56 | Re: pgsql: New files for MERGE |
Previous Message | Andres Freund | 2018-04-04 18:46:32 | Re: pgsql: New files for MERGE |