Scheduler

The scheduler is the kernels core component. It manages thread states and handles most system calls from threads.

The goal of scheduling is to execute time-critical threads before the deadline and to share the CPU between multiple threads, so that no thread ever starves. When the scheduler decides to run another thread, the CPU context is switched, leading to overhead. Context switch overhead is the biggest downside of an RTOS and must be minimized.

Preemptive

Whenever an interrupt is triggered on a microcontroller, the main program is preempted and the ISR is executed. The main program resumes afterwards. A preemptive scheduler does the same, but with threads.

sched-preempt
Figure: Priority based preemptive scheduling.

In the figure an interrupt 1 updates a resource T2 has been waiting for. Because T2 has a higher priority than the running thread T1, the scheduler will switch thread contexts and execute T2 instead 2. After T2 finished, it goes to sleep 3 and the scheduler resumes T1. If T1 sleeps as well, the idle thread will be executed. The idle has the lowest priority and will always run when all threads are suspended.

Preemptive scheduling is the default for most RTOS because in an embedded system, priorities can be assigned based on real-time criticality. A phase current measurement handler for example is more critical than updating a graphical user interface on a display.

Round-Robin (Time-Slicing)

If two or more threads run with the same priority, they cannot preempt each other. To still share the CPU, the threads can take turns.

sched-time-sliced
Figure: Round-robin scheduling / time slicing.

the figure shows three threads with the same priority. A thread (T1) is run at max one system tick. After one tick 1, the scheduler switches to the next thread (T2). A thread can also voluntarily yield the CPU 2. The scheduler then switches to the next thread (T3) 3 that cannot run for a full time-slice because the system tick triggers a thread switch.

RTOSs apply different approaches to time-slicing, in FreeRTOS, it can be enabled or disabled for all priorities [25] while in Azure RTOS time-slicing can be controlled per thread [39].

The kernel enables time-slicing for all threads by default in the kernel, but it can be disabled in the Cargo manifest.

Thread States

Scheduling threads strictly follows the finite state machine shown in the figure.

thread-sm
Figure: Thread finite state machine; number within state: number of threads allowed in this state; red switch icon: thread switch out, blue switch icon: thread switch in.

A thread can have one of five states:

Ready The thread is ready and waiting for a time slot to run on the CPU. This is the default state after a thread is spawned.

Running There can always be exactly one thread running per CPU core. On a context switch, the running thread is moved to its new state (red switch icon) according to its transition flag and the highest priority thread is moved from ready to running (blue switch icon). The only way for a thread to enter the running state is through the ready state. If no thread is ready to run, the idle thread will be executed.

Sleeping If the running thread requests to be suspended until a point in time, it will be placed in the sleeping state.

Blocked If the thread requests to be suspended until an event occurs, it will be placed in the blocked state.

Terminated A thread can exit by choice or when it triggers an exception (i.e., memory protection violation). In the current implementation, a terminated thread cannot be launched again. In a future implementation, the user should be able to choose the behavior upon thread termination.

On a microcontroller, a thread typically cycles through suspended to ready, then running and back to the suspended state because a thread often runs periodically (waiting of wake-up time) or asynchronically (waiting for an event).

The thread finite state machine is not explicitly programmed, but is rather the implicit behavior within the scheduler. The states are represented by doubly-linked lists shown in the figure and events are programmed in different sections in the scheduler (e.g., context switch, timer tick, event fire).

thread-list
Figure: Thread list in the scheduler (event list omitted, see the figure).

For a scheduler to work efficiently, it must keep context switches as short as possible. A doubly linked list with head and tail pointer is used to push an element to a list quickly. Multiple ready lists with decreasing priority lists minimize the search time for the ready thread with the highest priority. The list of sleeping threads is sorted by wake-up time to avoid having to look through all threads. The event list (separately shown in the figure) is again sorted by priority to shorten context switch times. If threads have the same priority, the order is based on first come, first served.

Minimizing the number of priorities in an RTOS application is important, because for every priority, a ready list is be allocated and searched for a ready thread on every context switch. The time to find a ready thread depends on its priority. But it can be determined deterministically. In the worst case, the lowest priority thread will be exchanged with another lowest priority ready thread. That will take twice the time to look through $n$-priorities list heads (once to push the running thread to a ready list and once to pop the highest priority ready thread).

Using multiple doubly linked lists is a common approach to manage threads as done in μC/OS-III, Zephyr and FreeRTOS [25]. Only Zephyr offers alternative data structures such as red-black tree to manage many threads [61].

Context Switch

The context of a thread consists of any volatile information that is specific to one thread, e.g.

CPU registers
Floating-point registers
Memory access configuration

On a function call, interrupt or exception the hardware pushes/pops some registers from/to the stack. In cooperative scheduling, a thread exits its entry function every time, thus stacking is completely done in hardware. However, if a thread should be preempted before it finished, some registers must be stored in software.

Because the number of registers to be stored in software depends on the hardware, a context switch is partially done in assembly in the architecture-specific implementation and the selection of the next thread is done in Rust.

context-switch
Figure: Context Switch on a CPU with Armv7E-M architecture.

the figure illustrates a context switch triggered by sleep call on an Arm Cortex-M4F (Armv7E-M) core. The stack pointer decreases as the stack grows, i.e., a lower stack pointer in the figure refers to more stack being used.

The thread is not allowed to change its own wake-up time, so it makes a system call using the SuperVisor Call (SVC) instruction. The SVCall exception has the highest priority behind system faults and will be called exactly after the SVC instruction [24, pp. 330]. Triggering the exception will cause the CPU to push 1 the exception stack frame (R0-R3, R12, Link Register (LR), Program Counter (PC), Program State Register (xPSR) and S0-S15 if the floating-point unit is used) to thread A stack. The stack pointer is switched to the main stack pointer and to handler mode automatically. The system call 2 has now full access and can set the threads wake-up time, the transition flag to Sleeping and trigger a pendable service (PendSV) call. The PendSV exception is set to a low priority by the kernel, so that time-critical interrupts can run before the context switch is complete. The PendSV handler first stores 3 the remaining registers (R4-R11, S16-S31 if the floating-point unit is used) to the thread stack in software.

The handler then calls the switch_thread function from the generalized Rust implementation of kernel. The function stores the current stack pointer of the thread and chooses the next thread according to the scheduling algorithm 4. From the next thread, it loads the stack pointer and adjusts the memory protection settings. The new stack pointer is returned to the PendSV handler.

The handler pops 5 the registers pushed in software and returns from the exception. The CPU now pops 6 exception stack frame and continues thread B 7 where it left of. [24, pp. 339]