Quick Links

Sorted writes in checkpoint

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Subject:	Sorted writes in checkpoint
Date:	2007-06-14 07:39:37
Message-ID:	20070614153758.6A62.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:
> > If the kernel can treat sequential writes better than random writes, is
> > it worth sorting dirty buffers in block order per file at the start of
> > checkpoints?

I wrote and tested the attached sorted-writes patch base on Heikki's
ldc-justwrites-1.patch. There was obvious performance win on OLTP workload.

tests | pgbench | DBT-2 response time (avg/90%/max)
---------------------------+---------+-----------------------------------
LDC only | 181 tps | 1.12 / 4.38 / 12.13 s
+ BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s
+ Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s

(*) Don't write buffers that were dirtied after starting the checkpoint.

machine : 2GB-ram, SCSI*4 RAID-5
pgbench : -s400 -t40000 -c10 (about 5GB of database)
DBT-2 : 60WH (about 6GB of database)

> I think it has the potential to improve things. There are three obvious
> and one subtle argument against it I can think of:
>
> 1) Extra complexity for something that may not help. This would need some
> good, robust benchmarking improvements to justify its use.

Exactly. I think we need a discussion board for I/O performance issues.
Can I use Developers Wiki for this purpose? Since performance graphs and
result tables are important for the discussion, so it might be better
than mailing lists, that are text-based.

> 2) Block number ordering may not reflect actual order on disk. While
> true, it's got to be better correlated with it than writing at random.
> 3) The OS disk elevator should be dealing with this issue, particularly
> because it may really know the actual disk ordering.

Yes, both are true. However, I think there is pretty high correlation
in those orderings. In addition, we should use filesystem to assure
those orderings correspond to each other. For example, pre-allocation
of files might help us, as has often been discussed.

> Here's the subtle thing: by writing in the same order the LRU scan occurs
> in, you are writing dirty buffers in the optimal fashion to eliminate
> client backend writes during BuferAlloc. This makes the checkpoint a
> really effective LRU clearing mechanism. Writing in block order will
> change that.

The issue will probably go away after we have LDC, because it writes LRU
buffers during checkpoints.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment	Content-Type	Size
sorted-ckpt.patch	application/octet-stream	4.2 KB

In response to

Re: Controlling Load Distributed Checkpoints at 2007-06-11 07:51:51 from Greg Smith

Responses

Re: Sorted writes in checkpoint at 2007-06-14 11:45:21 from Gregory Stark
Re: Sorted writes in checkpoint at 2007-06-14 13:22:06 from Heikki Linnakangas
Re: Sorted writes in checkpoint at 2007-06-14 15:58:33 from Greg Smith
Re: Sorted writes in checkpoint at 2007-06-14 17:50:17 from Simon Riggs
Re: Sorted writes in checkpoint at 2008-03-11 20:05:01 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	ITAGAKI Takahiro	2007-06-14 09:07:07	Re: DROP TABLE and autovacuum
Previous Message	ITAGAKI Takahiro	2007-06-14 07:20:04	Re: Load Distributed Checkpoints test results

Browse pgsql-patches by date

	From	Date	Subject
Next Message	ITAGAKI Takahiro	2007-06-14 09:07:07	Re: DROP TABLE and autovacuum
Previous Message	Andrew Dunstan	2007-06-14 00:00:17	Re: pipe chunks protocol