When developing Embedded Software for memory constrained microcontrollers (typically using Embedded C), the most common constraint is the memory – in particular, RAM is often the most constrained resource. This issue is further exacerbated when using a multithreaded operating system, where many threads are running simultaneously – each allocates some RAM for its stack. This leaves very little memory for static allocation and for doing other tasks. Many projects are also constrained by quality guidelines such as MISRA, which, for good reasons prohibits the use of dynamic memory.
So all in all, one is often placed in a very difficult predicament. The options are:
- Pay extra for a much bigger micro with more memory (expensive),
- Use external RAM (its also expensive, wastes real-estate on your PCB, and usually slower than onboard RAM),
- Find ways of using the memory more efficiently (there are many things that can be done here).
I am going to focus on using RAM more efficiently. There are many things that can be done here, not the least being to reduce the thread count to an absolute minimum. However, once all other options have been exhausted, you are often forced to go the dynamic memory route (yes, shock and horror). Well, once you have got over this shock, you will need to start focussing on making the most of this bad situation.
One major criticism of using dynamic memory is the fact that it can easily become fragmented. I will be describing a mechanism that I have very successfully used to prevent this fragmentation from becoming a problem.
Causes of fragmentation:
Dynamic memory is basically a continuous linear region of otherwise unused memory (on the heap) that is split up into different sized chunks according to the user’s requirements. It often resembles a double-linked list where each chunk is linked to the next chunk and to the previous one. Each chunk is the size that was requested when the malloc was called, plus a bit of overhead for pointers describing its location relative to other chunks. When a chunk is freed, it leaves a gap. When the malloc instruction is called, the dynamic allocation system first scans through the link-list searching for an unused block of memory big enough for what was requested. If it finds a gap, it is often bigger than what was required. This is often where fragmentation occurs – these gaps seldom get filled optimally, causing wasted space and your heap to become porous and to expand over time.
To illustrate a scenario, let me list typical steps. To keep things simple, let’s assume we have 3KB of space on the heap:
- Process A allocates 1KB of memory,
- Process B allocates 1KB of memory,
- Process A releases its memory,
- Process C tries to allocate 1.1KB of memory,
- The dynamic memory system cannot find a linear 1.1KB section of memory, so returns a NULL, causing the system to crash, halt or behave unexpectedly.
In a recent project, I needed to implement a thread-safe dynamic memory implementation where memory was very constrained (I also couldn’t use any blocking mechanisms since the application needed to work near real-time). Memory fragmentation was a real concern, so in order to solve this issue, I employed the following rules:
- Processes that need memory for relatively short periods of time (less than a few seconds) can allocate it and free it as per normal practice,
- As a housekeeping task, processes (“memory owners”) that need memory for longer periods of time need to defragment their memory by attempting to removing any memory gaps that occur before them,
- Done collaboratively, “memory owners” work together to efficiently eliminate memory fragmentation,
The basic mechanism of closing a memory gap occurring before an allocated memory chunk is simply done as follows:
- Allocate a new chunk (chunk B) of memory equal to the size of the chunk that needs to fill the gap.
- Copy the old chunk to chunk B,
- it is done like that in order to always preserve a continuous intact copy of the memory (other threads or interrupts may need access to it).
- Free the old chunk,
- at this point, the old chunk becomes a gap that then gets automatically merged with any previous gap there may have been.
- If chunk B was allocated earlier in memory than chunk A, the defragmentation process can be terminated and you can skip the following steps.
- Now allocate another chunk (chunk C) of memory equal to the size of chunk B
- This will usually be allocated at the start of the merged gap that has been created in the step above.
- Copy chunk B to chunk C
- Free chunk B
At this point, the fragmentation will have been eliminated and your stack is packed nice and tightly – no wasted space.
The process described above could be guaranteed because I had full control of the dynamic memory allocation algorithm (I developed, customized and optimized it for maximum performance and reliability). Although I haven’t tried this mechanism with other off the shelf dynamic memory allocation systems, I am confident many could be made to work to varying degrees of success. However, it would pay to do your homework before implementing such a scheme to make sure it is suitable (i.e. not a too simplistic a dynamic memory implementation).