You can use any thread synchronization primitive to build any other one. Here’s how:

Reducing into Event

First, the primitive needs to support blocking the caller until some condition occurs. With it, you can then create a new primitive called an Event. This is simply a type where one thread calls event.wait() which suspends its caller until another thread calls event.set(). Think of it as waiting for a bool to become true.

For example, here’s how to do it with POSIX threading primitives: With a Mutex and Condvar it’s pretty straight forward; Wrap a bool in the mutex, using cond.wait() to wait until it’s set and calling cond.signal() when setting it. If there’s only a Mutex available, locking it prematurely turns it into a Semaphore where a subsequent lock() is event.wait() and unlock() is event.set(). RwLock is effectively a Mutex here to the same effect.

For a more practical example, here’s how to build it from OS threading primitives: Most systems provide a Futex API, where wait(&atomic_int, value) blocks until the atomic no longer matches the value and wake(&atomic_int, N) unblocks N threads waiting on the atomic (after its value has presumably been updated). Some OS like Windows and NetBSD instead provide Events directly, with NtWaitForAlertByThreadId/NtAlerThreadByThreadId and lwp_park/lwp_unpark respectively.

For a threading primitive where the set() caller must block until a matching wait() (e.g. NtKeyedEvents or Rust’s Barrier), it can be made non-blocking by introducing an atomic bool in front; Both wait() and set() atomic swap the bool to true. If wait() sees False, set() is yet to arrive so it blocks on the primitive. If set() sees True, wait() is/will-be blocking so it unblocks the primitive.

Expanding from Event

Now with an Event, it can be used to make any other sync primitive. Let’s start with a Mutex:

Pseudo-code for a simplistic, Event based Mutex
type Node:
    next: ?*Node
    event: Event

type Mutex(state: Atomic(usize) = 0):
    lock():
        s = state.load(Relaxed)
        loop:
            while s & 1 == 0:
                s = state.cas(s, s | 1, Acquire, Relaxed) orelse return
            node = Node{ next: ?*Node(s & ~1), event: Event{} }
            s = state.cas(s, usize(&node) | 1, Release, Relaxed) orelse:
                node.event.wait()
                s = state.load(Relaxed)
                continue
    
    unlock():
        s = state.cas(1, 0, Release, Acquire) orelse return
        loop:
            node = *Node(s & ~1)
            s = state.cas(s, usize(node.next), Release, Acquire) orelse:
                node.event.set()
                return

If curious how this works, check out my previous post on Building a Tiny Mutex. With just an Event (+ optionally a Mutex), a Futex API can be built using similar tricks. From Futex, all other primitives can be made (as seen on linux glibc & musl). Here’s some examples:

Pseudo-code for simplistic, Futex based POSIX primitives
type Mutex(state: Atomic(u32)):
    lock():
        _ = state.cas(0, 1, Acquire, Relaxed) orelse return
        while state.swap(2, Acquire) != 0:
            Futex.wait(&state, 2)
    unlock():
        if state.swap(0, Release) == 2:
            Futex.wake(&state, 1)

type Condvar(state: Atomic(u32)):
    wait(mutex):
        s = state.load(Relaxed)
        mutex.unlock()
        Futex.wait(&state, s)
        mutex.lock()
    signal(): _wake(1)
    broadcast(): _wake(max(u32))
    _wake(n):
        _ = state.fetchAdd(1, Release)
        Futex.wake(&state, n)

It’s all pretty neat, but does it make sense to do this in practice? Yes.

Golang implements all its blocking using an Event within each goroutine, where wait() is gopark and set() is goready. A Futex-style implementation (exposed as a semaphore) is then written using said goroutine Event, and the other sync primitives then use that semaphore API.

Also, a while back I wrote a Rust crate called µSync which implements all Rust standard library (stdlib) sync primitives using an Event based on std::thread::park(). However, unlike Golang, this skips having an intermediary Futex and instead builds each directly on top of Event. The benchmarks showed it matching or outperforming those in the stdlib at the time, and some of the strategies wounded up contributed upstream (thanks joboet!).

So with one you can always make the others. The more interesting question is whether that’s a good idea..

I think so, at least.