Re: Excessive (and slow) fsync() within single transaction

From: Stephen Tyler <stephen(at)stephen-tyler(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Excessive (and slow) fsync() within single transaction
Date: 2009-12-09 11:24:38
Message-ID: 51549ea20912090324x43dcab45vb22d4dafe902e126@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Dec 9, 2009 at 12:57 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> You should turn on log_checkpoint in the postgresql.conf and confirm the
> slowdowns are happening around the same time as the checkpoint report gets
> written to the log files.

Thank you for your suggestions. I will turn on log_checkpoint next time I
get the chance to restart.

Some more details about my current config (almost no activity since boot
except the one large update still in progress):

select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
440 | 51 | 39248820 | 8876042
| 85527 | 3952243 | 48756188
and a couple of hours later:
465 | 56 | 43316686 | 9855720
| 95058 | 3997393 | 53713342

My analysis:

checkpoints_timed increments roughly every 5 minutes, and checkpoints_req
increments roughly every 30 minutes.

Around 80% of the checkpoints are timed, rather than requested. But the
pauses/freezes are of the order of 30 to 60 seconds apart (roughly aligned
with similar slow fsync calls), not 5 minutes apart. There are multiple
freezes per checkpoint.

During each checkpoint, 83K * 8K blocks were written out, or 650MB on
average per checkpoint. Around 20% were cleaned by the background cleaner.
if this occured at anywhere near my drive's average random write speed it
should take less than 20 seconds even if 100% random disc blocks.

I don't understand how maxwritten_clean could be as high as 95058, and
increment at more than 1 per second. This is a process count, not a buffer
count? How often is the background cleaner launched? Does that mean I need
to massively increase bgwriter_lru_maxpages, and other bgwriter params?
They are currently default values.

On Wed, Dec 9, 2009 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> fsyncs mostly happen as a result of checkpoint activity. I wonder
> whether you have done any tuning of checkpoint intervals,
> shared_buffers, etc.
>

I wouldn't describe the values as "tuned" ... more like set and forget ...
postgresql.conf (currently):

max_connections = 50
shared_buffers = 4GB
work_mem = 64MB
maintenance_work_mem = 1GB
synchronous_commit = 'off'
wal_buffers = 8MB
checkpoint_segments = 128
autovacuum = 'on'
effective_cache_size = 12GB
default_statistics_target = 100
constraint_exclusion = 'on'

shared_buffers is 1/4 of RAM, and effective_cache_size is 3/4 RAM. The
system is not swapping, and currently 9.5GB of RAM is marked as "Inactive".

checkpoint_segments is large, but in the suggested range of 32 to 256 for
large, write-heavy, batch-oriented systems. I've used smaller values in the
past, but I got lots of checkpoint warnings in the console.

wal_buffers might be a bit high. I don't fully understand the implications,
and how to tune this value, even after reading the tuning guides.

On Wed, Dec 9, 2009 at 4:40 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> possible resolutions might include:
> *) you might luck out and be dealing with known hopefully corrected
> issue with intel drives
>

Latest firmware for the 50nm version. I haven't found any known issue.
During the pauses, "Activity Monitor" reports no disk activity. But I am
not sure if this implies no SATA traffic.

> *) tweak checkpoints/bgwriter to try and reduce the penalty of fsync
>

Make them more frequent but smaller, or less frequent but bigger? Which way
has less penalty?

> *) disable fsync...risky, but I bet will make the problem go away
>

I'm very tempted to live dangerously. I would like to actually turn off
fsync's during a transaction, and just use the fsync at the very end of the
transaction to confirm that the transaction will be permanent. Or maybe use
the filesystem journaling function to ensure I can roll-back to a valid
state if something bad happens mid-transaction. But these do not seem to be
options.

> *) optimize your query. maybe try rewriting as delete/insert select
> from scratch table?
>

Unfortunately, I have a lot of foreign keys with "ON DELETE CASCADE", and
various other triggers. While that greatly simplified my housekeeping code,
I can no longer do delete & reinsert.

I can break the big transactions into hundreds or thousands of little
transactions. This does let me stop/restart the system at will, but I've
yet to see it help my throughput.

Maybe I've overlooked something, but nothing looks too wrong with
postgresql.conf to me, based on my reading of the tuning guides. One
checkpoint write every 5 minutes (80% timed) doesn't seem too bad.
Something else (probably the very slow fsync) is keeping the checkpoint
writing rate low, and there appear to be a lot more fsync's than checkpoint
segments.

Stephen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message George Silva 2009-12-09 11:38:33 Re: Rules and conditions
Previous Message Leif Biberg Kristensen 2009-12-09 09:07:26 Re: Rules and conditions