System Tracing

System logs provide insight about the events in the application and kernel. Log messages require processing power on the target to be transmitted and rarely provide precise timing information. System tracing, on the other hand gives event information with little overhead. Simple tracing can be performed using digital signal on an output pin and a measurement device. Using a separate software tool and a stream of trace data provides a detailed view of thread interaction.

Design

The kernel uses a tracing abstraction so that the user can select a tracing back-end of choice. The dependencies of the tracing library are illustrated in the figure.

tracing-deps
Figure: Tracing library dependencies.

A kernel component such as the scheduler uses the static tracing functions. Then, the tracing library calls the functions from the tracing back-end. However, the tracing back-end is an unknown dependency at the time the tracing library is compiled. To overcome this issue, function shims and extern functions are used. A function shim simply calls the corresponding external function.

The set of tracing functions is defined in the RtosTracing trait. A tracing back-end (e.g. SystemView) is required to completely implement the trait. With the help of a macro the extern functions are generated. Therefore, the dependency from the tracing library is now resolved at link time instead of compile time. A similar method is used for Rusts global allocator.

alloc-sequence
Figure: Allocation sequence diagram from inside a process.

First the GUI thread makes an allocation request using boxed::Box::new(...) from Rusts alloc library. The library then calls the global allocator implemented in the kernel. From there, with a system call, the program switches into kernel mode so that it can access the kernel internals. The global allocator then requests the allocator from the currently running process. The memory block is then allocated and the system call ends. Finally, the global allocator returns a block a memory to the allocator, which returns a type safe object to the thread.

\notebox{ At the moment the bump allocator is the default for all processes. Other allocators are not yet available for processes.

The new `allocator_api` [[44]](/bibliography.md#ref-44) will allow the user to select an allocator for every object, i.e. `Box::try_new_in(42, MyAllocator)`. The new allocator API will be supported as soon as the feature is stabilized.

}

Bump Allocator

A bump allocator provides memory in a fast and deterministically. Its implementation is simple because a bump allocator cannot free memory. [40]

the figure illustrates the bump allocator over time.

alloc-bump
Figure: Bump allocator memory view.

The management structure contains a pointer to a block of memory used for the allocation. Initially the pointer is at the base of the block. Whenever a thread requests memory, the pointer address increases accordingly. Denoted in grey are areas which are wasted due to alignment. When C is allocated it the pointer address again increments according to the request size.

In the last column B is deallocated. This scenario should never occur as the bump allocator cannot free memory. Hence, the allocator pointer does not change its position and Bs memory block goes to waste. The allocator logs a warning whenever a deallocation attempt is made. Thanks to its simplicity, the bump allocator is lock free.

\notebox{ As the bump allocator never frees memory all memory request are known at compile time. They could be replaced by a static variable. A static allocation has the advantage that an out of memory exception is caught at compile time. However, static memory must be initialized statically. This is an issue because for example a mutex needs an ID from the kernel. In that case, the user must declare the mutex statically and then request initialization from the kernel. There is a chance that the user forgets the second step and the mutex has no valid ID. To overcome this issue, lazy initialization could be used where the mutex is initialized on the first call. However, that introduces runtime overhead from the initialization checks.

Because of to separation of the kernel and application memory for there are at least two variables for every kernel object. For example, a thread has a stack in application code, but the thread table entry is in kernel memory. Thus, static allocation for a thread is a little more complicated than with a conventional RTOS. In the long term, the goal is to replace the bump allocator with static allocations to provide stronger guaranties at compile time.

}

Heap Allocator

In a dynamic system with many short lived variables static allocation might not fit into memory. A heap can allocate memory for variables and free the memory block at the end of the variables lifetime.

The heap allocator in Bern RTOS uses a free-list and is based on [40]. The memory view over time is shown in the figure.

alloc-heap
Figure: Free-list based heap allocator memory view.

At the beginning, the free-list just contains one node. The node stores the size of the free block and the pointer to the next block. After some time, there might be some holes between variables with longer lifetimes. The second column in the figure shows two free nodes in the list. The nodes are sorted in an ascending order to allow for block merging.

When the application requests memory for C the allocator searches the free-list for the first large enough free block. There is some memory left after C is allocated. The allocator places a new node in that free block.

After some time, the lifetime of B ends and the application request deallocation of its memory. First, a new node is placed in Bs place. The node is then sorted into the free-list. As a last step, the allocator checks if there is one or more adjacent free blocks and merges them.

Merging free blocks reduces memory fragmentation drastically, but does not completely resolve the issue. A heap works well as long as all variables remain in scope for a similar amount of time. If there are a few long living and some very short lived variables more and more holes will prevent efficient memory usage.

Concerning functional safety, note that heap allocation is not deterministic. The number of free nodes varies greatly at runtime and so does the search time for a free memory block.

\notebox{ The current implementation can only merge blocks with ascending addresses. Memory fragments very quickly. Therefore, the heap cannot be selected as allocator strategy for the moment. }

Memory Pool

The memory pool or fixed block allocator is an attempt on a deterministic and fast heap. It combats fragmentation by allocating fixed blocks of memory. [9, pp. 279]

the figure illustrates a memory pool over time.

alloc-pool
Figure: Memory pool allocator memory view.

An empty memory pool consists of multiple partitions with free memory blocks. There is a free-queue for each partition. After some time, the order of the nodes in the free-queue can change. Note that variables are mostly not a perfect fit and leave some unused space in the memory block (grey).

When the application requests memory for D the allocator looks through the partition table for the first partition with a large enough and free block. The search is fast and has bounded execution time because there are a constant number of partitions. If a partition fits the requirements, the first block is popped from the queue.

Deallocating a variable (in this case D) is straightforward. In its deallocation request, the application passes the address of D. The pool allocator then checks which partition the address belongs to. Finally, a free node is pushed to the back of the partition.

The free-queue is implemented atomically. Thus, the allocation is lock-free and interrupts do not need to be disabled.