A thread or sometimes called task in embedded software represents an entity of work. It has an entry function that typically contains an infinite loop. A thread either runs periodically (isochronous) or an event triggers processing (asynchronous). On an embedded system, threads spend most of their time waiting for a specific time or an event. The main benefit of using an RTOS for a user comes from the separation of work into scalable units and a unified communication system between them. One or more threads and a memory region together form a process.

### Design

Bern RTOS uses a parallelism abstraction known from computers: processes and threads. Running on the RTOS are at least one process each contains one or more threads. However, there are some differences from a general purpose operating system (GPOS). Similar to GPOS processes are isolated from each other, but there is are no virtual addresses on a microcontroller. All memory and peripherals are mapped into one continuous address space. Also, on a GPOS, processes are started individually, while on Bern RTOS, all processes are compiled at the same time and linked into one image.

Figure: Resource organization with process and thread tables in the kernel [19, p. 109].

Processes and management structures are separated in Bern RTOS. the figure shows threads running within processes in user mode. The accompanying management structures (process and thread table) are stored in kernel memory.

Figure: Memory accessible from Thread Y.

The separation of process memory is illustrated as a memory map in the figure. Every process and the kernel have their own block of memory for data. The CPU starts on the main stack, which is later used for the kernel and cannot be accessed by processes anymore. The memory size of a process is defined at compile time. In the example of process B there is some static data placed in process memory. The rest is automatically available to the process allocator. Thread stacks can be allocated using a process allocator. Each stack will have an overflow barrier to prevent the corruption of data from a stack overflow.

A thread can access data on its own stack as well as stacks of threads of the same process. Globally shared data and static data within the process are also accessible. However, a thread will be terminated if it attempts to read or modify data on the main stack, the kernel memory, or any other process. Zephyrs usermode memory domain [61] inspired the behavior of the process memory.

In most RTOS, a thread can access anything in RAM, including the thread/task control blocks (TCB) [9, pp. 113]. The TCB contains the state of a thread, including the priority. For an RTOS to be secure, control structures such as the TCB must be inaccessible to threads, such that priorities for example cannot be directly changed.

Figure: Thread stack and control block.

As shown in the figure a thread can access its own stack. Only the kernel stores and accesses the thread control block, called Thread in Bern RTOS. The Thread structure contains owning pointers to the stack, privilege information including priority and memory access configuration, as well as transitions and wake-up times used by the schedulers finite state machine.

The fields of the Thread structure can never be directly accessed; they are set via system calls, e.g., bern_kernel::sleep(), bern_kernel::thread_exit(), mutex.lock(100). It is the responsibility of the kernel to validate system calls before executing them. There are no thread handles at the moment, thus threads can only change their own state.

Threads in Bern RTOS are spawned using closures (see usage subsection). A closure can capture its environment, which is placed on the main stack. The closure will generate an anonymous structure without memory layout guaranties [48]. Because the thread will not be able to access the main stack, the data structure must be moved to the thread stack — 1 in the figure. Every closure will have a different type, to launch a generic type a trait object 2 is used. A trait object in Rust consists of a fat pointer pointing to a data structure and to methods (vtable) [4]. The closure can now be entered through the trait object without the use of generic types, performed in the entry() function.

Data alignment must be considered when copying data manually to a stack 3. On an Arm Cortex-M CPU (Armv7E-M) for example, the compiler will optimize the fetch operation of the trait object (two 32bit addresses) to a single load double-word instruction (ldrd, [24, p. 135]).

Below the closure starts the usable part of the thread stack 4. At thread creation, the initial register values will put the stack. These include a program counter at the entry() function with the trait object loaded as a parameter in the registers 5. For the kernel, starting a thread is equal to continuing a previously running thread, it will switch the stack and pop the registers.

After some time the stack will contain locally stacked variables and when being paused the current set of registers 6. If a thread is to attempt to grow the stack below the bottom mark, an exception will call the kernel to deal with the issue.

### Usage

Creating a thread follows the build pattern in the way freertos-rust does [7]. The thread builder can take any settings regarding a new thread. The kernel initialized the thread and stack, partially because only the kernel can access the stack for the new thread. The builder pattern also allows running threads to spawn new threads.

static PROC: &Process = bern_kernel::new_process!(my_process1, 8192);
static mut MY_BUFFER: [u8; 16] = [0; 16];
#[entry]
fn main() -> ! {
let mut board = Board::new();
board.led.set_high();

bern_kernel::kernel::init();3
bern_kernel::time::set_tick_frequency(
1.kHz(),
board.sysclock().Hz()
);

let mut led = board.led;
PROC.init(move |c| {4
.priority(Priority::new(0))6
.stack(Stack::try_new_in(c, 1024).unwrap()7)
.spawn(move || 8{
loop {
led.set_high();9
bern_kernel::sleep(250);
led.set_low();
bern_kernel::sleep(750);10
}
});
}).unwrap();

bern_kernel::start();11
}


The macro at 1 in the listing defines processes statically. The macro generates a linker section with the name given and checks the size requirements of the process memory. For example, on the Armv7E-M memory protection unit, a memory region must have a $2^n$ size and alignment. By defining processes at compile time, the linker can place the memory blocks with minimal memory waste.

Any static data can be placed in the process memory by specifying the memory section 2. If no link section is defined the static data are placed in the global shared section that all threads can access.

Before any kernel function call, we must initialize the scheduler 3 and set the tick frequency. Threads can only be spawned in the process context 4. A thread builder is created by calling 5. Now, the default priority can be changed 6 and a stack can be allocated for the thread 7. Note that thread priorities are global for all threads from all processes. There are no process priorities.

Lastly, the spawn() call consumes the builder and takes the thread entry point as a closure. The closure captures 8 the environment, i.e., takes led. A typical thread contains an infinite loop, in this case an LED 9 is toggled. A running thread can call static functions of the kernel 10 as the scheduler will always know which thread is running. At last the scheduler is started at 11.

Because a closure can capture its environment, parameters are passed to the thread with type information, no casting of pointers is necessary. Moving captured variables to threads enables the compiler to check ownership and multi-threaded access. The compiler would for example return an error if led where to be used after 9.

Thread::new(c)