>> Using futexes directly could be even cheaper.
> Note that below this you only have the futex(2) system call.
I was only referring to the fact that we could save one function and one library
call, which could make a difference for the uncontended case.