Re: Is pg_control file crashsafe?

From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Is pg_control file crashsafe?
Date: 2016-05-06 08:13:37
Message-ID: 572C5231.800@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 05.05.2016 7:16, Amit Kapila wrote:
> On Wed, May 4, 2016 at 8:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
> >
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com
> <mailto:amit(dot)kapila16(at)gmail(dot)com>> writes:
> > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov
> <a(dot)ignatov(at)postgrespro(dot)ru <mailto:a(dot)ignatov(at)postgrespro(dot)ru>>
> > > wrote:
> > >> On 03.05.2016 2:17, Tom Lane wrote:
> > >>> Writing a single sector ought to be atomic too.
> >
> > >> pg_control is 8k long(i think it is legth of one page in default PG
> > >> compile settings).
> >
> > > The actual data written is always sizeof(ControlFileData) which
> should be
> > > less than one sector.
> >
> > Yes. We don't care what happens to the rest of the file as long as the
> > first sector's worth is updated atomically. See the comments for
> > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
> >
> > We could change to a different PG_CONTROL_SIZE pretty easily, and there's
> > certainly room to argue that reducing it to 512 or 1024 would be more
> > efficient. I think the motivation for setting it at 8K was basically
> > "we're already assuming that 8K writes are efficient, so let's assume
> > it here too". But since the file is only written once per checkpoint,
> > efficiency is not really a key selling point anyway. If you could make
> > an argument that some other size would reduce the risk of failures,
> > it would be interesting --- but I suspect any such argument would be
> > very dependent on the quirks of a specific file system.
> >
>
> How about using 512 bytes as a write size and perform direct writes
> rather than going via OS buffer cache for control file? Alex, is the
> issue reproducible (to ensure that if we try to solve it in some way, do
> we have way to test it as well)?
>
> >
> > One point worth considering is that on most file systems, rewriting
> > a fraction of a page is *less* efficient than rewriting a full page,
> > because the kernel first has to read in the old contents to fill
> > the disk buffer it's going to partially overwrite with new data.
> > This motivates against trying to reduce the write size too much.
> >
>
> Yes, you are very much right and I have observed that recently during my
> work on WAL Re-Writes [1]. However, I think that won't be the issue if
> we use direct writes for control file.
>
>
> [1] -
> http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jBtjXBFyD67R5dLcqFyOMj4f-qmFXBP1OOQ@mail.gmail.com
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>

Hi!
No issue happened only once. Also any attempts to reproduce it is not
successful yet

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-05-06 12:21:55 Re: Feature request: make cluster_name GUC useful for psql prompts
Previous Message Alex Ignatov 2016-05-06 08:09:37 Re: Is pg_control file crashsafe?