Re: [HACKERS] TODO item

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] TODO item
Date: 2000-02-07 16:40:17
Message-ID: 20018.949941617@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
>> possibly fix #2 by having transaction commit invoke the pg_fsync_pending
>> scan before it updates pg_log (and then fsyncing pg_log itself again
>> after).

> I do not understand #2. I call pg_fsync_pending twice in
> RecordTransactionCommit, one is after FlushBufferPool, and the other
> is after TansactionIdCommit and FlushBufferPool. Or am I missing
> something?

Oh, OK. That's what I meant. The snippet you posted didn't show where
you were calling the fsync routine from.

> I thought about that too. If the ordering was that important, a
> database managed by backends with -F on could be seriously
> corrupted. I've never heard of such disasters caused by -F.

This is why I think that fsync actually offers very little extra
protection ;-)

> BTW, Hiroshi has noticed me an excellent point #3:

>> This backend has to force the flush of a free buffer
>> page. Unfortunately the page was dirtied by the
>> above operation of Session-1 and calls pg_fsync()
>> for the table A. However fsync() is postponed until
>> commit of this backend.
>>
>> Session-1
>> commit;
>> There's no dirty buffer page for the table A.
>> So pg_fsync() isn't called for the table A.

Oooh, right. Backend A dirties the page, but leaves it sitting in
shared buffer. Backend B needs the buffer space, so it does the
fwrite of the page. Now if backend A wants to commit, it can fsync
everything it's written --- but does that guarantee the page that
was actually written by B will get flushed to disk? Not sure.

If the pending-fsync logic is based on either physical fds or vfds
then it definitely *won't* work; A might have found the desired page
sitting in buffer cache to begin with, and never have opened the
underlying file at all!

So it seems you would need to keep a list of all the relation files (and
segments) you've written to in the current xact, and open and fsync each
one just before writing/fsyncing pg_log. Even then, you're assuming
that fsync applied to a file via an fd belonging to one backend will
flush disk buffers written to the same file via *other* fds belonging
to *other* processes. I'm not sure that that is true on all Unixes...
heck, I'm not sure it's true on any. The fsync(2) man page here isn't
real specific.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-02-07 16:47:45 Re: [HACKERS] TODO item
Previous Message Bruce Momjian 2000-02-07 16:39:21 Inprise/Corel merger