From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | Jeff *EXTERN* <jeff(at)jefftrout(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Replaying 48 WAL files takes 80 minutes |
Date: | 2012-10-30 10:07:48 |
Message-ID: | 508FA6F4.8040905@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 30.10.2012 10:50, Albe Laurenz wrote:
> Why does WAL replay read much more than it writes?
> I thought that pretty much every block read during WAL
> replay would also get dirtied and hence written out.
Not necessarily. If a block is modified and written out of the buffer
cache before next checkpoint, the latest version of the block is already
on disk. On replay, the redo routine reads the block, sees that the
change was applied, and does nothing.
> I wonder why the performance is good in the first few seconds.
> Why should exactly the pages that I need in the beginning
> happen to be in cache?
This is probably because of full_page_writes=on. When replay has a full
page image of a block, it doesn't need to read the old contents from
disk. It can just blindly write the image to disk. Writing a block to
disk also puts that block in the OS cache, so this also efficiently
warms the cache from the WAL. Hence in the beginning of replay, you just
write a lot of full page images to the OS cache, which is fast, and you
only start reading from disk after you've filled up the OS cache. If
this theory is true, you should see a pattern in the I/O stats, where in
the first seconds there is no I/O, but the CPU is 100% busy while it
reads from WAL and writes out the pages to the OS cache. After the OS
cache fills up with the dirty pages (up to dirty_ratio, on Linux), you
will start to see a lot of writes. As the replay progresses, you will
see more and more reads, as you start to get cache misses.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2012-10-30 10:08:59 | Re: out of memory |
Previous Message | AndyG | 2012-10-30 09:47:51 | Re: Slow query, where am I going wrong? |