From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] TODO item |
Date: | 2000-02-07 16:40:17 |
Message-ID: | 20018.949941617@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
>> possibly fix #2 by having transaction commit invoke the pg_fsync_pending
>> scan before it updates pg_log (and then fsyncing pg_log itself again
>> after).
> I do not understand #2. I call pg_fsync_pending twice in
> RecordTransactionCommit, one is after FlushBufferPool, and the other
> is after TansactionIdCommit and FlushBufferPool. Or am I missing
> something?
Oh, OK. That's what I meant. The snippet you posted didn't show where
you were calling the fsync routine from.
> I thought about that too. If the ordering was that important, a
> database managed by backends with -F on could be seriously
> corrupted. I've never heard of such disasters caused by -F.
This is why I think that fsync actually offers very little extra
protection ;-)
> BTW, Hiroshi has noticed me an excellent point #3:
>> This backend has to force the flush of a free buffer
>> page. Unfortunately the page was dirtied by the
>> above operation of Session-1 and calls pg_fsync()
>> for the table A. However fsync() is postponed until
>> commit of this backend.
>>
>> Session-1
>> commit;
>> There's no dirty buffer page for the table A.
>> So pg_fsync() isn't called for the table A.
Oooh, right. Backend A dirties the page, but leaves it sitting in
shared buffer. Backend B needs the buffer space, so it does the
fwrite of the page. Now if backend A wants to commit, it can fsync
everything it's written --- but does that guarantee the page that
was actually written by B will get flushed to disk? Not sure.
If the pending-fsync logic is based on either physical fds or vfds
then it definitely *won't* work; A might have found the desired page
sitting in buffer cache to begin with, and never have opened the
underlying file at all!
So it seems you would need to keep a list of all the relation files (and
segments) you've written to in the current xact, and open and fsync each
one just before writing/fsyncing pg_log. Even then, you're assuming
that fsync applied to a file via an fd belonging to one backend will
flush disk buffers written to the same file via *other* fds belonging
to *other* processes. I'm not sure that that is true on all Unixes...
heck, I'm not sure it's true on any. The fsync(2) man page here isn't
real specific.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-02-07 16:47:45 | Re: [HACKERS] TODO item |
Previous Message | Bruce Momjian | 2000-02-07 16:39:21 | Inprise/Corel merger |