Both ROM and RAM strongly determine the cost of an MCU. And therefor, for an embedded developer, it is important to regard it as valuable resources. Especially for high-volume products (e.g. MCU’s for cars) for which cheap hardware is important, it is crucial to optimize ROM and RAM usage.
In this post we assume the MCU supports (XIP) execute-in-place. This means that code can be executed from ROM.
1. Isolate configuration
In general, low-end MCU’s have significantly more ROM than RAM. For example, one of our development boards, the STM32VLDISCOVERY, has a STM32F100RB micro-controller with 128 KB Flash and 8 KB RAM. In this section we focus on saving RAM.
A well-known technique is to qualify data as ‘const’. Const data then becomes part of the read-only data section which is not copied from ROM to RAM at system start-up. Easier said than done… how to organize this in a practical way? How to code ‘for const’? The general guideline is to isolate settings and configuration.
make a static board configuration
Low-end embedded systems are very static. The hardware board and required MCU features are well-known and do not change at run-time. Static configuration should end-up in ROM. One technique of defining a static board configuration is to create configuration tables. E.g. a table for the buttons (see this post). These tables can be made const if the data structures also follow next rule…
explicitly separate state and settings
Settings are static while state is dynamic. By separating them your are able to make the settings ‘const’ if applicable. The more settings, the more important this split-up becomes. See Debounced Buttons for an example.
define which hardware features are needed
MCU’s always have more features than you need. Be sure to only take the code and data that you actually need. The various techniques are development platform dependent (e.g. one could use a ‘CONFIG_’ define to define which hardware features are taken into the build).
2. Write good code
Good code has high cohesion and low coupling (see this post) and therefor, is easier to optimize.
understand the difference between a driver and a device
A driver ‘drives’ or controls a physical device (or a hardware peripheral or a virtual device). The driver is implemented as a collection of functions in the source code (and it might fit in a specific kernel framework as well).
The device (in the source code) is the representation of one instance of a the physical device i.e. a data struct which is an argument of the driver functions.
The consequence is that one driver could support several device instances (e.g. an spi driver can control 2 spi buses SPI0 and SPI1). In any case, try to avoid driver code that looks like: if(SPI0) then do_this() else if (SPI1) then do_that().
improve domain knowledge
This one is probably the most important item in this post. Bad domain knowledge likely results in bad code because of ill-defined structs or classes with bad responsibilities, which will likely result in low cohesion and high coupling.
3. Toolchain configuration
Instruction set configuration
While the cortex m3 always executes the 16-bit thumb2 instruction set, its predecessor – the ARM7 – had (and still has) 2 options:
- a 32-bit arm mode and
- a 16-bit thumb mode.
The thumb mode has less but smaller instructions than the arm mode. Therefor, in thumb mode, two 16-bit instructions fit in one 32-bit location but in general it needs more instructions to get something done.
ROM savings are 20-30% (smaller instructions) at the cost of 20-30% performance (more instructions to do something).
Most compilers are able to optimize code in one or another way. Optimization is finding a balance between:
- speed (cpu performance – e.g. -O2 for gcc) and
- size (code compactness – e.g. -Os for gcc).
The main drawback of optimization is a build which is harder to debug because of a possible mismatch between the non-optimized instruction and symbol addresses of your code and the real, optimized, ‘reshuffled’ addresses.
Remove unused sections
Some linkers can ‘garbage collect’ unused sections which can minimize your build’s ROM/RAM footprint.
For example, this command:
gcc -fdata-sections -ffunction-sections file.c -o file.o -Wl,–gc-sections
- gcc to create a section for each function (.text.function_name) and variable (.text.var_name), and tells
- ld to discard unreferenced sections.
Strip debugging information
For systems (e.g. Linux) that load an ELF image, you want the debugging information to be stripped from your target image. As long as your debugging info is available on your host system, debugging shall be possible.
For low-end embedded systems this is typically not relevant as they download a binary image to the target.