Re: Redesigning checkpoint_segments

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Redesigning checkpoint_segments
Date: 2013-08-23 21:53:30
Message-ID: 5217D9DA.5040004@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/23/2013 02:08 PM, Heikki Linnakangas wrote:

> Here's a bigger patch, which does more. It is based on the ideas in the
> post I started this thread with, with feedback incorporated from the
> long discussion. With this patch, WAL disk space usage is controlled by
> two GUCs:
>
> min_recycle_wal_size
> checkpoint_wal_size
>
<snip>

> These settings are fairly intuitive for a DBA to tune. You begin by
> figuring out how much disk space you can afford to spend on WAL, and set
> checkpoint_wal_size to that (with some safety margin, of course). Then
> you set checkpoint_timeout based on how long you're willing to wait for
> recovery to finish. Finally, if you have infrequent batch jobs that need
> a lot more WAL than the system otherwise needs, you can set
> min_recycle_wal_size to keep enough WAL preallocated for the spikes.

We'll want to rename them to make it even *more* intuitive.

But ... do I understand things correctly that checkpoint wouldn't "kick
in" until you hit checkpoint_wal_size? If that's the case, isn't real
disk space usage around 2X checkpoint_wal_size if spread checkpoint is
set to 0.9? Or does checkpoint kick in sometime earlier?

> except that it's more
> intuitive to set it in terms of "MB of WAL space required", instead of
> "# of segments between checkpoints".

Yes, it certainly is. We'll need to caution people that fractions of
16MB will be ignored.

> Does that make sense? I'd love to hear feedback on how people setting up
> production databases would like to tune these things. The reason for the
> auto-tuning between the min and max is to be able to set reasonable
> defaults e.g for embedded systems that don't have a DBA to do tuning.
> Currently, it's very difficult to come up with a reasonable default
> value for checkpoint_segments which would work well for a wide range of
> systems. The PostgreSQL default of 3 is way way too low for most
> systems. On the other hand, if you set it to, say, 20, that's a lot of
> wasted space for a small database that's not updated much. With this
> patch, you can set "max_wal_size=1GB" and if the database ends up
> actually only needing 100 MB of WAL, it will only use that much and not
> waste 900 MB for useless preallocated WAL files.

This sounds good, aside from the potential 2X issue I mention above.

Mind you, what admins really want is a hard limit on WAL size, so that
they can create a partition and not worry about PG running out of WAL
space. But ...

> Making it a hard limit is a much bigger task than I'm willing to tackle
> right now.

... agreed. And this approach could be built on for a hard limit later on.

As a note, pgBench would be a terrible test for this patch; we really
need something which creates uneven traffic. I'll see if I can devise
something.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-08-23 21:55:00 Re: Performance problem in PLPgSQL
Previous Message Heikki Linnakangas 2013-08-23 21:08:30 Re: Redesigning checkpoint_segments