From: | Peter Schuller <peter(dot)schuller(at)infidyne(dot)com> |
---|---|
To: | Greg Smith <gsmith(at)gregsmith(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: lru_multiplier and backend page write-outs |
Date: | 2008-11-06 23:04:01 |
Message-ID: | 20081106230400.GA95142@hyperion.scode.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> > no table was ever large enough that 256k buffers would ever be filled by
> > the process of vacuuming a single table.
>
> Not 256K buffers--256K, 32 buffers.
Ok.
> > In addition, when I say "constantly" above I mean that the count
> > increases even between successive SELECT:s (of the stat table) with
> > only a second or two in between.
>
> Writes to the database when only doing read operations are usually related
> to hint bits: http://wiki.postgresql.org/wiki/Hint_Bits
Sorry, I didn't mean to imply read-only operations (I did read the
hint bits information a while back though). What I meant was that
while I was constantly generating the insert/delete/update activity, I
was selecting the bg writer stats with only a second or two in
between. The intent was to convey that the count of backend written
pages was systematically and constantly (as in a few hundreds per
handful of seconds) increasing, in spite of no long running vacuum and
the buffer cache not being close to full.
> > On this topic btw, was it considered to allow the administrator to
> > specify a fixed-size margin to use when applying the JIT policy?
>
> Right now, there's no way to know exactly what's in the buffer cache
> without scanning the individual buffers, which requires locking their
> headers so you can see them consistently. No one process can get the big
> picture without doing something intrusive like that, and on a busy system
> the overhead of collecting more data to know how exactly far ahead the
> cleaning is can drag down overall performance. A lot can happen while the
> background writer is sleeping.
Understood.
> One next-generation design which has been sketched out but not even
> prototyped would take cleaned buffers and add them to the internal list of
> buffers that are free, which right now is usually empty on the theory that
> cached data is always more useful than a reserved buffer. If you
> developed a reasonable model for how many buffers you needed and padded
> that appropriately, that's the easiest way (given the rest of the buffer
> manager code) to get close to ensuring there aren't any backend writes.
> Because you've got the OS buffering writes anyway in most cases, it's hard
> to pin down whether that actually improved worst-case latency though. And
> moving in that direction always seems to reduce average throughput even in
> write-heavy benchmarks.
Ok.
> The important thing to remember is that the underlying OS has its own read
> and write caching mechanisms here, and unless the PostgreSQL ones are
> measurably better than those you might as well let the OS manage the
> problem instead.
The problem though is that though the OS may be good in the common
cases it is designed for, it can have specific features that are
directly counter-productive if your goals do not line up with that of
the commonly designed-for use case (in particular, if you care about
latency a lot and not necessarily about absolute max throughput).
For example, in Linux up until recently if not still, there is the
1024 per-inode buffer limit that limited the number of buffers written
as a result of expiry, which means that when PostgreSQL does its
fsync(), you may end up having a lot more to write out than what would
have been the case if the centisecs_expiry had been enforced,
regardless of whether PostgreSQL was tuned to write dirty pages out
sufficiently aggressively. If the amount built up exceeds the capacity
of the RAID controller cache...
I had a case where I suspect this was exaserbating the
situation. Manually doing a 'sync' on the system every few seconds
noticably helped (the theory being, because it forced page write-outs
to happen earlier and in smaller storms).
> It's easy to demonstrate that's happening when you give
> a decent amount of memory to shared_buffers, it's much harder to prove
> that's the case for an improved write scheduling algorithm. Stepping back
> a bit, you might even consider that one reason PostgreSQL has grown as
> well as it has in scalability is exactly because it's been riding
> improvements the underlying OS in many of these cases, rather than trying
> to do all the I/O scheduling itself.
Sure. In this case with the backend writes, I am nore interesting in
understanding better what is happening and having better indications
of when backends block on I/O, than necessarily having a proven
improvement in throughput or latency. It makes it easier to reason
about what is happening when you *do* have a measured performance
problem.
Thanks for all the insightful information.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2008-11-06 23:04:03 | Re: Create and drop temp table in 8.3.4 |
Previous Message | Scott Marlowe | 2008-11-06 22:45:57 | Re: Create and drop temp table in 8.3.4 |