Follow our illustrated blog on Embedded Software Architecture

How to build a hybrid solar/wind energy harvester?

User level spin locks revisited

User level spin locks revisited , 10.0 out of 10 based on 2 ratings
VN:F [1.9.22_1171]
Rating: 10.0/10 (2 votes cast)

This article was written – quite a few years ago – but recent activities made it worthwhile to take another look at it,  revise the content and give it an overhaul to alleviate and clear up some of the inaccuracies.

It starts with a quote and excerpt from the book Inside Microsoft Windows 2000, Third Edition, by David Solomon and Mark Russinovich…

The concept of mutual exclusion is a crucial one in operating systems development. It refers to the guarantee that one, and only one, thread can access a particular resource at a time. Mutual exclusion is necessary when a resource doesn’t lend itself to shared access or when sharing would result in an unpredictable outcome… If one thread reads a memory location while another writes to it, the first thread will receive unpredictable data… The same error could occur on a single processor system if the operating system were to perform a context switch while the two threads are accessing this non-shareable resource or critical section… The biggest area of concern is interrupts. The kernel (or a thread) might be updating a global data structure when an interrupt occurs whose interrupt-handling routine also modifies the structure.

What is a spin lock?

A spin lock is an inter-process, thread or task synchronization mechanism like a Mutex, Semaphore or a (WIN32) Critical Section. Spin locks are often used in Symmetric Multiprocessor Operating System Kernels. Because sometimes it is more efficient to constantly poll for the availability of a lock rather than…

  • Pre-empting the thread (doing the context switch), schedule another thread
  • Maintaining and altering the extra housekeeping structures (“who” owns the synchronization object or lock, how long, has the timeout interval elapsed,…)
  • Doing the extra work (of C function calls or even of assembler routines) for a fair first-in-first-out mechanism.
  • Doing a system call to (context) switch from user space to kernel space.

In short,

a spin lock is a kind of busy waiting; it causes the execution unit – which is trying to acquire the lock – to simply loop while repeatedly checking if the lock is available.

Compared with e.g. the WIN32 CRITICAL_SECTION structure and its functions (or pthread mutexes for that matter), and certainly with any other synchronization mechanism like advanced mutexes and semaphores, the proposed simple spin lock implementation is certainly less fair (there is no protection against the theoretical “starvation” of a thread that is forever-waiting for the resource to unlock).

Also, it is more “unsafe” for your programs: if a Thread holding the lock unexpectedly dies (without freeing the lock), the presented spin lock has no way of knowing this, hence your threads waiting on the resource will also not be able to recover and thus “spin forever”, just burning CPU cycles.

But if used very carefully, a spin lock is in certain situations a lot faster than any other synchronization mechanism your operating system has to offer (unless it also provides user accessible Spin Locks, of course). You can use this lock mechanism in software running on top of an OS, and also on a system without an OS where you have to protect queues, buffers, lists, structures, etc… against other threads and interrupt routines, which can leave your shared data in an inconsistent state. The spin lock can also be used in SMP systems (with at least 2 cores) mainly to protect small and short critical sections.

All you have to do is to port one (1) function to your specific CPU platform.

When first investigating these synchronization issues for no-OS-Embedded systems, I came across software locking mechanisms such as Peterson’s Algorithm (for protecting a critical section against data corruption by 2 competing processes or tasks) and Lamport’s Bakery Algorithm (for protecting a critical section against data corruption by N competing processes or tasks). Both solutions (and their derivatives) make the assumption that, (reads and) writes to the shared memory region are “atomic”. This just means that, (reads and) writes happen in a not-interruptible sequence of (CPU) instructions (or preferably only one CPU instruction), which is not the case on some hardware platforms. However, aforementioned algorithms can be used successfully to build locks when an atomic test-and-set instruction is not available in hardware.

This brings us to the standard way of avoiding data corruption in the no-OS-world when memory is shared between interrupt routines and a ‘normal’ – not-interrupt-driven – part of the program (what is mostly referred to as the “main loop”):

Temporarily disabling the interrupts from within the main loop or running thread when manipulating data from the shared resource, however this can (in extreme cases where the time in the critical section is too ‘long’) lead to “loss of events, interrupts or data”.

The solution for this problem is to find a processor specific test-and-set or swap instruction that is SMP safe and executes in one not-interruptible sequence. This idea is, in literature, also referred to as the “Test And Set” method.

The Algorithm

The Hardware algorithm for a binary semaphore using the Test and Set method:


wait(s)
{
  do
  {
    set s = 0;
    if (s == 0 AND previous value of s == 1)
    {
      break; // we obtained the lock
    }
  }
  while (true);
}

explanation of the wait function

  • so, we set the variable ‘s’ to 0.
  • if the previous value of ‘s’ was 1, then this means that we acquired the lock; but if the previous value of ‘s’ was 0, then some other task already acquired the lock and we do not have it. We will have to wait for it, by trying the test again until it succeeds.

signal(s)
{
  set s = 1; // release the lock
}

explanation of the release function

  • so, we predicate that the lock was previously obtained. The value of ‘s’ is thus 0.
  • To release the lock, we simply need to make ‘s’ 1 again.

Spin lock usage

When should you use a spin lock?

Only when it makes sense not to do a context switch in the event of a locked resource:

  • The time the thread has to wait is smaller than the amount of time to do the context switch. (This is only true for SMP operating systems running on SMP hardware! If there is no real parallelism and you have a pre-emptive OS, you should always prefer a context switch. Which thread is going to free the resource anyway on a uni-processor system?)
  • The chance of a resource conflict (= the chance of having to actually wait for the lock) in the critical section is either very small or the average waiting time is less than the time needed for the OS to do the context switching.
  • The critical section itself is very small and the number of checks is very high.
  • You don’t have another synchronization mechanism because you do not have an OS.

Benefits of using a spin lock and conclusion

Depends on the situation, of course, but the main reason is of course: speed.
It’s up to the potential user of these spin locks to decide which application is best suited for them.

Spin locks can either be used on an embedded device without an operating system, or they can be used – with care – on a SMP machine with a SMP operating system. The mechanism is used to enforce mutual exclusion for shared resources. It is not recommended to use (blocking) spin locks on uni-processor systems (but one could use a ‘try’-enter-lock and if the lock is not obtained an alternate code path could be executed). The key question is, whether the developer will let the scheduler intervene in case of a locked “shared resource” or “critical section”. If it is decided to “spin” the lock, the developer must be certain that performance will be gained by not triggering the context switch.

For more information about Spin Locks, I refer to Wikipedia here and here, and there you find a reference to the original article on CodeProject. Life is a circle, so to speak 😉

Addendum

Mr Pierre Couderc added the following comment on CodeProject which I cannot rebut: “It could be debated if your assertion about the non atomicity of read and write operations is true or not. It is a question of definition, but atomicity is usually defined as the fact that the operation (read or write in memory, in this case) is done in a non interruptable “cycle” (either 1 cycle, either many cycles with the bus locked). It is the operation on the bus that is considered and not the instruction on the processor. And in this sense, the i386 like any processor at my knowledge have atomic read and write. It has too an atomic TestAndSet. Peterson and Lamport algorithms were done for computers on which read and write are supposed to be atomic, but where no assumption is done about TestAndSet. So these algorithm are perfectly valid on i386 but not so useful on modern computers that have hardware ways (TestAndSet) to solve these problems.

On recent Windows platforms, test-and-set can be done by calling the “InterlockedExchange” function. So this opens the path to very simple user-level spin locks. Also, SetCriticalSectionSpinCount can increase the performance of the WIN32 EnterCriticalSection function.

When implementing and using this spin lock on your platform, be aware of cache issues, memory barriers and instruction reordering effects!

VN:F [1.9.22_1171]
Rating: 10.0/10 (2 votes cast)
Share
About rtos.be

rtos.be is a joined initiative from Gert Boddaert and Pieter Beyens.
More info: about us, our mission, contact us.

Speak Your Mind

*