Re: Spinlocks, yet again: analysis and proposed patches

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <marko(at)l-t(dot)ee>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spinlocks, yet again: analysis and proposed patches
Date: 2005-09-13 21:18:23
Message-ID: 26803.1126646303@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen <marko(at)l-t(dot)ee> writes:
> Hmm. I guess this could be separated into 2 cases:

> 1. Light load - both lock owner and lock requester wont get
> scheduled while busy (owner in critical section, waiter
> spinning.)
> 2. Big load - either or both of them gets scheduled while busy.
> (waiter is scheduled by OS or voluntarily by eg. calling select())

Don't forget that the coding rules for our spinlocks say that you
mustn't hold any such lock for more than a couple dozen instructions,
and certainly any kernel call while holding the lock is Right Out.
There is *no* case where the holder of a spinlock is going to
voluntarily give up the CPU. The design intention was that the
odds of losing the CPU while holding a spinlock would be negligibly
small, simply because we don't hold it very long.

> About fast yielding, comment on sys_sched_yield() says:
> * sys_sched_yield - yield the current processor to other threads.
> *
> * this function yields the current CPU by moving the calling thread
> * to the expired array. If there are no other threads running on this
> * CPU then this function will return.

Mph. So that's pretty much exactly what I suspected...

I just had a thought: it seems that the reason we are seeing a
significant issue here is that on SMP machines, the cost of trading
exclusively-owned cache lines back and forth between processors is
so high that the TAS instructions (specifically the xchgb, in the x86
cases) represent a significant fraction of backend execution time all
by themselves. (We know this is the case due to oprofile results,
see discussions from last April.) What that means is that there's a
fair chance of a process losing its timeslice immediately after the
xchgb. Which is precisely the scenario we do not want, if the process
successfully acquired the spinlock by means of the xchgb.

We could ameliorate this if there were a way to acquire ownership of the
cache line without necessarily winning the spinlock. I'm imagining
that we insert a "dummy" locked instruction just ahead of the xchgb,
which touches the spinlock in such a way as to not change its state.
(xchgb won't do for this, but maybe one of the other lockable
instructions will.) We do the xchgb just after this one. The idea is
that if we don't own the cache line, the first instruction causes it to
be faulted into the processor's cache, and if our timeslice expires
while that is happening, we lose the processor without having acquired
the spinlock. This assumes that once we've got the cache line, the
xchgb that actually does the work can get executed with not much
extra time spent and only low probability of someone else stealing the
cache line back first.

The fact that cmpb isn't helping proves that getting the cache line in a
read-only fashion does *not* do enough to protect the xchgb in this way.
But maybe another locking instruction would. Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-09-13 22:02:52 Re: Spinlocks, yet again: analysis and proposed patches
Previous Message Marko Kreen 2005-09-13 20:46:11 Re: Spinlocks, yet again: analysis and proposed patches