In summary, the final firmware can control actuators based on measurements from RTD temperature, flow and pressure sensors. The display shows live measurements. The user can control the machine using the tactile switches. Measurements are also sent to a computer on the local network for temperature control analysis.

One goal of the application was to implement these features to create a usable prototype machine. However, we should also assess the memory footprint and the performance of the application. Finally, the real-world use case was meant to reveal issues of Bern RTOS, which we will review as well.

Memory Footprint

Memory on microcontroller is typically sparse. Hence, an RTOS should be as small as possible for the features it provides. Cargo bloat [18] is a tool that groups symbols in the linker output map by crates used in a project. There is some guesswork, as some function might get inlined outside a crate. Using the tool the application was analyzed in debug and release build mode. As an intermediate, there is also a debug mode where all application dependencies are built in release mode. The results are listed in the table.


The binary size varies greatly with optimization level as indicated by the total flash size. The biggest contributor is the GUI. That comes as no surprise as there is an image in flash and graphics libraries are typically large. Many binary symbols from the GUI are not correctly picked up by the tool and are counted towards the other section. The release build without GUI resulted in a 66 kB binary. Some crates produce larger binaries in release mode than debug, which might be due to inlining, moving code from one crate to another.

During development, it became evident that optimizing dependencies is necessary in debug mode. Not only because it halves the binary size, but also because it greatly improves timing.

Bern RTOS uses 6.1 kB Flash memory in release mode. Although, the impact on binary size is bigger because some code might be inlined in the application. Additionally, there are macros that generate RTOS related code in the user application.


More important than the impact of an RTOS on the binary size is the real-time performance. Opposed to static size analysis, timing is evaluated at runtime using tracing functions. SEGGER SystemView is one of the few tools for RTOS analysis. The tool is designed to visualize and analyze traces from RTOS in real-time.

Figure: Firmware module hierarchy and moving resources.

The initialization phase starts when the program enters the main function. At that point, the CPU has access to the entire system. Hence, the board is set up and split to pass it to the different processes. Additionally, inter-process message queues (blue) between processes are allocated. Main then passes these resources to the processes. Processes are loosely coupled by message queues and each implemented in its own module.

A process then allocates synchronization primitives (red) for the threads and passes resources to the threads. Finally, the kernel starts, memory barriers are set up and resources cannot be moved anymore.

This structure splits a complex problem into manageable and loosely coupled problems. Each process has its own memory region with an allocator. Therefore, memory shortages can be traced back to the threads within that process. However, at the moment thread priorities are global. The process level is skipped and priorities are scattered over multiple files. As thread priorities influence the performance of the application greatly, they should be on process level. As a solution a process priority could be introduced. The process priorities could be set in main. Then, on each process level, the priorities for the threads within the process could be set. Hence, no knowledge about the whole application is necessary.

At this moment, the kernel provides the synchronization primitives mutex and semaphore. Using semaphores to flag an event from an interrupt is however more cumbersome than necessary. Often threads only rely on a signal from an event. The kernel should also provide this simple event primitive.

The entry point of a thread is a function that contains some initialization and then an infinite loop. One issue with the infinite loop is that testing is suboptimal because it is more work to input test data into a running loop than to input a function. Another problem is awaiting one of many events. As embedded systems mostly run event based the CPU is often awaiting events. The kernel currently does not support awaiting more than one event at a time. For example, the network log thread polls multiple message queues to check if there is any new data. The kernel API could be remodeled so that the thread entry function signals pending events in its context. Most importantly, the entry function would always run to completion. That solution would simplify testing and event processing.

On testing, there are currently no mocks or simulation of the kernel. Thus, the application can only be tested at the unit level. Integration tests on the host system are impossible. Traits abstracting RTOS dependencies and mocks should be added to the kernel.

Lastly, the API has functions relying on memory allocation and some on statically sized data. The API should be usable with static and dynamic memory allocation. Though, the impact of the allocation scheme should be minimal on the kernel usage. For example, initializing a thread with a statically allocated stack should be almost identical to a dynamically allocated one.