From: | "Scott Carey" <scott(at)richrelevance(dot)com> |
---|---|
To: | "Christiaan Willemsen" <cwillemsen(at)technocon(dot)com> |
Cc: | jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Configuring for maximum memory usage |
Date: | 2008-10-31 15:47:20 |
Message-ID: | a1ec7d000810310847j10060c84m40c6e080ada59289@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, Oct 30, 2008 at 2:06 PM, Christiaan Willemsen <
cwillemsen(at)technocon(dot)com> wrote:
> Thanks guys,
> Lots of info here that I didn't know about! Since I have one of the latest
> Opensolaris builds, I guess the write throttle feature is already in there.
> Sadly, the blog doesn't say what build has it included.
>
If I recall correctly, it went in at about build 89 or so (I think the
bottom of the link I provided has a comment to that effect). So its in
there now, but not in OpenSolaris 2008.05.
>
> For writes, I do everything synchronized because we really need a
> consistent database on disk. We can see that during large inserts, the
> intend log is used a lot.
>
The DB synchronizes the WAL log automatically, and the table and index data
are written non-synchronously until the commit at the end of a checkpoint,
in which case sync is called on them. This keeps things consistent on
disk. With ZFS, each block written is always consistent, with a checksum
kept in the parent block. There are no partial page writes, ever. In
theory, you can disable full page writes on the WAL log if there is a
bottleneck there since ZFS guarantees fully transactional consistent state
of the file system, even if you have a RAID controller or hardware failure
that causes a partial write. But WAL log activity is probably not your
bottleneck so turning off full page writes on the WAL log is not necessary.
>
> What I'm going to te testing is a smaller shared_buffers value, and a
> large ARC cache, and exactly the other way around.
>
> Another question: since we have huge tables with hundreds of millions or
> rows, we partitioned the database (it actually is creating the partitions
> dynamically now on inserts with very good performance :D ), but the question
> is: is the size of the partions important for the memory parameters in
> config file? How can we determine the optimal size of the partition.
> obviously, when doing selects, you want those preferably only needing a
> single partition for speed. At the moment, that is for the majority of
> situations the case. But there might be some other things to think about...
>
> Kind regards,
>
> Christiaan
>
>
> On Oct 30, 2008, at 7:27 PM, Scott Carey wrote:
>
>
>
> On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd(at)commandprompt(dot)com>wrote:
>
>> On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:
>>
>> >
>> > Remember that PostgreSQL doesn't cache anything on its own so
>> > if you do
>> > want to hit disk it has to be in file cache.
>> >
>> > By my understanding, this is absolutely false. Postgres caches pages
>> > from tables/indexes in shared_buffers. You can make this very large if
>> > you wish.
>>
>> You can make it very large with a potentially serious performance hit.
>> It is very expensive to manage large amounts of shared buffers. It can
>> also nail your IO on checkpoint if you are not careful (even with
>> checkpoint smoothing). You are correct that I did not explain what I
>> meant very well because shared buffers are exactly that, shared
>> buffers.
>
>
> You can slam your I/O by havnig too large of either OS file cache or
> shared_buffers, and you have to tune both.
> In the case of large shared_buffers you have to tune postgres and
> especially the background writer and checkpoints.
> In the case of a large OS cache, you have to tune parameters to limit the
> ammount of dirty pages there and force writes out smoothly.
> Both layers attempt to delay writes for their own, often similar reasons,
> and suffer when a large sync comes along with lots of dirty data.
>
> Recent ZFS changes have been made to limit this, (
> http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
> in earlier ZFS versions, this is what usually killed databases -- ZFS in
> some situations would delay writes too long (even if "long" is 5 seconds)
> and get in trouble. This still has to be tuned well, combined with good
> checkpoint tuning in Postgres as you mention. For Linux, there are similar
> issues that have to be tuned on many kernels, or up to 40% of RAM can fill
> with dirty pages not written to disk.
>
> Letting the OS do it doesn't get rid of the problem, both levels of cache
> share very similar issues with large sizes and dirty pages followed by a
> sync.
>
> The buffer cache in shared_buffers is a lot more efficient for large
> scanning queries -- A select count(*) test will be CPU bound if it comes
> from shared_buffers or the OS page cache, and in the former case I have seen
> it execute up to 50% faster than the latter, by avoiding calling out to the
> OS to get pages, purely as a result of less CPU used.
>
>
>
>>
>>
>> However that isn't the exact same thing as a "cache" at least as I was
>> trying to describe it. shared buffers are used to keep track of pages
>> (as well as some other stuff) and their current status. That is not the
>> same as caching a relation.
>>
>> It is not possible to pin a relation to memory using PostgreSQL.
>> PostgreSQL relies on the operating system for that type of caching.
>>
>
> The OS can't pin a relation either, from its point of view its all just a
> bunch of disk data blocks, not relations -- so it is all roughly
> equivalent. The OS can do a bit better job at data prefetch on sequential
> scans or other predictable seek sequences (ARC stands for Adaptive
> Replacement Cache) than postgres currently does (no intelligent prefetch in
> postgres AFAIK).
>
> So I apologize if I made it sound like Postgres cached the actual relation,
> its just pages -- but it is basically the same thing as the OS cache, but
> kept in process closer to the code that needs it. Its a cache that prevents
> disk reads.
>
> My suggestion for the OP is to try it both ways, and see what is better for
> his workload / OS / Hardware combination.
>
>
>>
>> Joshua D. Drake
>>
>>
>>
>>
>>
>> --
>>
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Frost | 2008-10-31 17:58:28 | Re: Index usage problem on 8.3.3 |
Previous Message | Gregory Stark | 2008-10-31 08:53:26 | Re: Index usage problem on 8.3.3 |