From: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net> |
Subject: | Re: Load distributed checkpoint |
Date: | 2006-12-26 09:58:26 |
Message-ID: | 20061226152731.5D53.ITAGAKI.TAKAHIRO@oss.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I assume write() is not our checkpoint performance problem, but the
> transfer to disk via fsync(). Perhaps a simple solution is to do the
> write()'s of all dirty buffers as we do now at checkpoint time, but
> delay 30 seconds and then do fsync() on all the files.
I think there are two platforms that have different problems in checkpoints.
It's in fsync() on one platform, and in write() on another. It is complex
depending on OS, the amount of memory, disks, writeback-cache and so on.
> I think the basic difference between this and the proposed patch is that
> we do not put delays in the buffer write() or fsync() phases --- we just
> put a delay _between_ the phases, and wait for the kernel to smooth it
> out for us. The kernel certainly knows more about what needs to get to
> disk, so it seems logical to let it do the I/O smoothing.
Both proposals do not conflict each other. Also, solutions for either
platform do not have bad effect on the other platform. Can we employ
both of them?
I tested your proposal but it did not work on write-critical machine.
However, if the idea works well on BSD or some platforms, we would be
better off buying it.
[pgbench results]
...
566.973777
327.158222 <- (1) write()
560.773868 <- (2) sleep
544.106645 <- (3) fsync()
...
[changes in codes]
(This is a bad implementation because shutdown takes long time!)
void
FlushBufferPool(void)
{
BufferSync(); // (1) write -- about 20s
time_t start = time(NULL);
while (time(NULL) - start < 30) // (2) sleep -- 30s
{
pg_usleep(BgWriterDelay * 1000L);
BgBufferSync();
AbsorbFsyncRequests();
}
smgrsync(); // (3) fsync -- less than 200ms
}
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Doug Knight | 2006-12-26 15:34:33 | Re: pg_standby and build farm |
Previous Message | Takayuki Tsunakawa | 2006-12-26 03:21:12 | Re: Load distributed checkpoint |
From | Date | Subject | |
---|---|---|---|
Next Message | Joachim Wieland | 2006-12-26 12:10:03 | Re: Micro doc patch (plpgsql) |
Previous Message | David Fetter | 2006-12-26 04:19:22 | Re: Micro doc patch (plpgsql) |