Re: Wait free LW_SHARED acquisition - v0.9

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Wait free LW_SHARED acquisition - v0.9
Date: 2014-10-14 13:58:43
Message-ID: 20141014135843.GF9267@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote:
> On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
> >> FWIW, the profile always looks like
> >> - 48.61% postgres postgres [.] s_lock
> >> - s_lock
> >> + 96.67% StrategyGetBuffer
> >> + 1.19% UnpinBuffer
> >> + 0.90% PinBuffer
> >> + 0.70% hash_search_with_hash_value
> >> + 3.11% postgres postgres [.] GetSnapshotData
> >> + 2.47% postgres postgres [.] StrategyGetBuffer
> >> + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string
> >> + 1.28% postgres postgres [.] hash_search_with_hash_value
> >> - 1.27% postgres postgres [.] LWLockAttemptLock
> >> - LWLockAttemptLock
> >> - 97.78% LWLockAcquire
> >> + 38.76% ReadBuffer_common
> >> + 28.62% _bt_getbuf
> >> + 8.59% _bt_relandgetbuf
> >> + 6.25% GetSnapshotData
> >> + 5.93% VirtualXactLockTableInsert
> >> + 3.95% VirtualXactLockTableCleanup
> >> + 2.35% index_fetch_heap
> >> + 1.66% StartBufferIO
> >> + 1.56% LockReleaseAll
> >> + 1.55% _bt_next
> >> + 0.78% LockAcquireExtended
> >> + 1.47% _bt_next
> >> + 0.75% _bt_relandgetbuf
> >>
> >> to me. Now that's with the client count 496, but it's similar with lower
> >> counts.
> >>
> >> BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
> >> smarter.
> >
> > Which is nearly trivial now that atomics are in. Check out the attached
> > WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
> > there's buffers on the freelist.
>
> Is this safe?
>
> + victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1);
>
> - if (++StrategyControl->nextVictimBuffer >= NBuffers)
> + buf = &BufferDescriptors[victim % NBuffers];
> +
> + if (victim % NBuffers == 0)
> <snip>
>
> I don't think there's any guarantee you could keep nextVictimBuffer
> from wandering off the end.

Not sure what you mean? It'll cycle around at 2^32. The code doesn't try
to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning
&BufferDescriptors I'm doing % NBuffers.

Yes, that'll have the disadvantage of the first buffers being slightly
more likely to be hit, but for that to be relevant you'd need
unrealistically large amounts of shared_buffers.

> You could buy a little safety by CAS'ing
> zero in instead, followed by atomic increment. However that's still
> pretty dodgy IMO and requires two atomic ops which will underperform
> the spin in some situations.
>
> I like Robert's idea to keep the spinlock but preallocate a small
> amount of buffers, say 8 or 16.

I think that's pretty much orthogonal. I *do* think it makes sense to
increment nextVictimBuffer in bigger steps. But the above doesn't
prohibit doing so - and it'll still be be much better without the
spinlock. Same number of atomics, but no potential of spinning and no
potential of being put to sleep while holding the spinlock.

This callsite has a comparatively large likelihood of being put to sleep
while holding a spinlock because it can run for a very long time doing
nothing but spinlocking.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-10-14 14:20:15 Re: Drop any statistics of table after it's truncated
Previous Message Merlin Moncure 2014-10-14 13:40:49 Re: Wait free LW_SHARED acquisition - v0.9