From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Move unused buffers to freelist |
Date: | 2013-06-27 13:50:32 |
Message-ID: | CA+TgmoZK_-riTdfY=9e4k8WyHNMQZH-WHLw6Uf=B7gT18tBeEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Contention wise I aggree. What I have seen is that we have a huge
> amount of cacheline bouncing around the buffer header spinlocks.
How did you measure that?
> I have previously added some adhoc instrumentation that printed the
> amount of buffers that were required (by other backends) during a
> bgwriter cycle and the amount of buffers that the buffer manager could
> actually write out.
I think you can see how many are needed from buffers_alloc. No?
> I don't think I actually found any workload where
> the bgwriter actually wroute out a relevant percentage of the necessary
> pages.
Check.
> Problems with the current code:
>
> * doesn't manipulate the usage_count and never does anything to used
> pages. Which means it will just about never find a victim buffer in a
> busy database.
Right. I was thinking that was part of this patch, but it isn't. I
think we should definitely add that. In other words, the background
writer's job should be to run the clock sweep and add buffers to the
free list. I think we should also split the lock: a spinlock for the
freelist, and an lwlock for the clock sweep.
> * by far not aggressive enough, touches only a few buffers ahead of the
> clock sweep.
Check. Fixing this might be a separate patch, but then again maybe
not. The changes we're talking about here provide a natural feedback
mechanism: if we observe that the freelist is empty (or less than some
length, like 32 buffers?) set the background writer's latch, because
we know it's not keeping up.
> * does not advance the clock sweep, so the individual backends will
> touch the same buffers again and transfer all the buffer spinlock
> cacheline around
Yes, I think that should be fixed as part of this patch too. It's
obviously connected to the point about usage counts.
> * The adaption logic it has makes it so slow to adapt that it takes
> several minutes to adapt.
Yeah. I don't know if fixing that will fall naturally out of these
other changes or not, but I think it's a second-order concern in any
event.
> There's another thing we could do to noticeably improve scalability of
> buffer acquiration. Currently we do a huge amount of work under the
> freelist lock.
> In StrategyGetBuffer:
> LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
> ...
> // check freelist, will usually be empty
> ...
> for (;;)
> {
> buf = &BufferDescriptors[StrategyControl->nextVictimBuffer];
>
> ++StrategyControl->nextVictimBuffer;
>
> LockBufHdr(buf);
> if (buf->refcount == 0)
> {
> if (buf->usage_count > 0)
> {
> buf->usage_count--;
> }
> else
> {
> /* Found a usable buffer */
> if (strategy != NULL)
> AddBufferToRing(strategy, buf);
> return buf;
> }
> }
> UnlockBufHdr(buf);
> }
>
> So, we perform the entire clock sweep until we found a single buffer we
> can use inside a *global* lock. At times we need to iterate over the
> whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all
> the usage counts enough (if the database is busy it can take even
> longer...).
> In a busy database where usually all the usagecounts are high the next
> backend will touch a lot of those buffers again which causes massive
> cache eviction & bouncing.
>
> It seems far more sensible to only protect the clock sweep's
> nextVictimBuffer with a spinlock. With some care all the rest can happen
> without any global interlock.
That's a lot more spinlock acquire/release cycles, but it might work
out to a win anyway. Or it might lead to the system suffering a
horrible spinlock-induced death spiral on eviction-heavy workloads.
> I think even after fixing this - which we definitely should do - having
> a sensible/more aggressive bgwriter moving pages onto the freelist makes
> sense because then backends then don't need to deal with dirty pages.
Or scanning to find evictable pages.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-06-27 13:51:07 | Re: pg_filedump 9.3: checksums (and a few other fixes) |
Previous Message | Tom Lane | 2013-06-27 13:48:25 | Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn |