Quick Links

Re: Disaster!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Martín Marqués <martin(at)bugs(dot)unl(dot)edu(dot)ar>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disaster!
Date:	2004-01-24 17:52:43
Message-ID:	3813.1074966763@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I said:
> If there wasn't disk space enough to hold the clog page, the checkpoint
> attempt should have failed. So it may be that allowing a short read in
> slru.c would be patching the symptom of a bug that is really elsewhere.

After more staring at the code, I have a theory. SlruPhysicalWritePage
and SlruPhysicalReadPage are coded on the assumption that close() can
never return any interesting failure. However, it now occurs to me that
there are some filesystem implementations wherein ENOSPC could be
returned at close() rather than the preceding write(). (For instance,
the HPUX man page for close() states that this never happens on local
filesystems but can happen on NFS.) So it'd be possible for
SlruPhysicalWritePage to think it had successfully written a page when
it hadn't. This would allow a checkpoint to complete :-(

Chris, what's your platform exactly, and what kind of filesystem are
you storing pg_clog on?

regards, tom lane

In response to

Re: Disaster! at 2004-01-24 17:40:19 from Tom Lane

Responses

Re: Disaster! at 2004-01-25 00:21:45 from Christopher Kings-Lynne
Re: Disaster! at 2004-01-26 18:04:12 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2004-01-24 19:09:07	Re: cvsignore
Previous Message	Tom Lane	2004-01-24 17:40:19	Re: Disaster!