RE: [HACKERS] Recovery on incomplete write

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Bruce Momjian" <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: [HACKERS] Recovery on incomplete write
Date: 1999-10-05 09:25:48
Message-ID: 000701bf0f13$9c0790c0$2801007e@cadzone.tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > -----Original Message-----
> > From: Bruce Momjian [mailto:maillist(at)candle(dot)pha(dot)pa(dot)us]
> > Sent: Tuesday, September 28, 1999 11:54 PM
> > To: Tom Lane
> > Cc: Hiroshi Inoue; pgsql-hackers
> > Subject: Re: [HACKERS] Recovery on incomplete write
> >
> >
> > > "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> > > > I have wondered that md.c handles incomplete block(page)s
> > > > correctly.
> > > > Am I mistaken ?
> > >
> > > I think you are right, and there may be some other trouble
> spots in that
> > > file too. I remember thinking that the code depended heavily on never
> > > having a partial block at the end of the file.
> > >
> > > But is it worth fixing? The only way I can see for the file length
> > > to become funny is if we run out of disk space part way
> through writing
> > > a page, which seems unlikely...
> > >
> >
> > That is how he got started, the TODO item about running out of disk
> > space causing corrupted databases. I think it needs a fix, if we can.
> >
>
> Maybe it isn't so difficult to fix.
> I would provide a patch.
>

Here is a patch.

1) mdnblocks() ignores a partial block at the end of relation files.
2) mdread() ignores a partial block of block number 0.
3) mdextend() adjusts its position to a multiple of BLCKSZ
before writing.
4) mdextend() truncates extra bytes in case of incomplete write.

If there's no objection,I would commit this change to the current
tree.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp

*** storage/smgr/md.c.orig Thu Sep 30 10:50:58 1999
--- storage/smgr/md.c Tue Oct 5 13:30:55 1999
***************
*** 233,239 ****
int
mdextend(Relation reln, char *buffer)
{
! long pos;
int nblocks;
MdfdVec *v;

--- 233,239 ----
int
mdextend(Relation reln, char *buffer)
{
! long pos, nbytes;
int nblocks;
MdfdVec *v;

***************
*** 243,250 ****
if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)
return SM_FAIL;

! if (FileWrite(v->mdfd_vfd, buffer, BLCKSZ) != BLCKSZ)
return SM_FAIL;

/* remember that we did a write, so we can sync at xact commit */
v->mdfd_flags |= MDFD_DIRTY;
--- 243,264 ----
if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)
return SM_FAIL;

! if (pos % BLCKSZ != 0) /* the last block is incomplete */
! {
! pos = BLCKSZ * (long)(pos / BLCKSZ);
! if (FileSeek(v->mdfd_vfd, pos, SEEK_SET) < 0)
! return SM_FAIL;
! }
!
! if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)
! {
! if (nbytes > 0)
! {
! FileTruncate(v->mdfd_vfd, pos);
! FileSeek(v->mdfd_vfd, pos, SEEK_SET);
! }
return SM_FAIL;
+ }

/* remember that we did a write, so we can sync at xact commit */
v->mdfd_flags |= MDFD_DIRTY;
***************
*** 432,437 ****
--- 446,453 ----
{
if (nbytes == 0)
MemSet(buffer, 0, BLCKSZ);
+ else if (blocknum == 0 && nbytes > 0 && mdnblocks(reln) == 0)
+ MemSet(buffer, 0, BLCKSZ);
else
status = SM_FAIL;
}
***************
*** 1067,1072 ****
{
long len;

! len = FileSeek(file, 0L, SEEK_END) - 1;
! return (BlockNumber) ((len < 0) ? 0 : 1 + len / blcksz);
}
--- 1083,1088 ----
{
long len;

! len = FileSeek(file, 0L, SEEK_END);
! return (BlockNumber) (len / blcksz);
}

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 1999-10-05 09:32:44 Questions about bufmgr
Previous Message Jose Antonio Cotelo lema 1999-10-05 08:44:03 User types using large objects. Is it really possible?