Re: Load distributed checkpoint

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Load distributed checkpoint
Date: 2007-01-10 04:51:32
Message-ID: 20070110133319.DC70.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I wrote:
> I'm thinking about generalizing your idea; Adding three parameters
> to control sleeps in each stage.

I sent a patch to -patches that adds 3+1 GUC parameters for checkpoints.
We can use three of them to control sleeps in each stage during checkpoints.
The last is an experimental approach to replace fsync() for fine control.

1. checkpoint_write_duration (default=0, in seconds)
Sets the duration of write() phase in checkpoints.
2. checkpoint_nap_duration (default=0, in seconds)
Sets the duration between write() and fsync() phases in checkpoints.
3. checkpoint_sync_duration (default=0, in seconds)
Sets the duration of fsync() phase in checkpoints.

The 1st parameter spreads write(). If you set checkpoint_write_duration
to 90% of checkpoint_timeout, it's just same as the patch I sent before.

The 2nd is naptime between write() and fsync() phases. Kernel's writer might
work much if you set it to around 30-60s, that might be useful for some
traditional UNIXes, as you say. In contrast, the 1st was the most variable
in my machine somehow (Windows and Linux).

The 3rd spreads fsync(). This parameter only works when you have several
tables or a very large table (that consists of some 1GB of files), because
fsync() is on a file basis.

Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> To summarize, if we could have fsync() only write the dirty buffers that
> happened as part of the checkpoint, we could delay the write() for the
> entire time between checkpoints, but we can't do that, so we have to
> make it user-tunable.

The 3rd has the above limitation so that I added another parameter.

4. checkpoint_sync_size (default=0, in KB)
Sets the synchronization unit of data files in checkpoints.

It uses sync_file_range or mmap/msync/munmap to divide file-synchronization
into specified granularity. I think 16-64MB fits the machines in that
performance are restricted by fsync() in checkpoints.

The feature is uncompleted. For example, sync_file_range does not flush
metadata of files in fact (it's equivalent of fdatasync), so we may lose
data under the patch. It must be fixed, but I want to measure the advantage
before that.

I'm interested in which parameter is useful for each environment.
Any comments and testing reports will be appreciated.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-01-10 04:55:13 Re: Dynamically sizing FSM?
Previous Message ITAGAKI Takahiro 2007-01-10 04:40:10 Re: Load distributed checkpoint patch

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-01-10 05:18:46 Re: Building libpq/psql with Borland BCC5
Previous Message ITAGAKI Takahiro 2007-01-10 04:40:10 Re: Load distributed checkpoint patch