Gamasutra: The Art & Business of Making Gamesspacer
Anatomy of a Memory Manager
Printer-Friendly VersionPrinter-Friendly Version
View All     RSS
April 25, 2014
arrowPress Releases
April 25, 2014
PR Newswire
View All
View All     Submit Event





If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:


 
Anatomy of a Memory Manager
by Stewart Lynch on 05/20/13 09:12:00 am   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutraís community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

Each new console generation has significantly more memory than the last. It’s tempting to think that in the next generation memory won’t matter. Whilst it’s true that memory doesn’t matter as much as it did in the 8bit days, I’ve yet to work on a game that didn’t have memory problems at some point during development.

The level of memory management I will discuss in this blog is at the new/delete level. By default this is handled by the standard malloc/free, but this can be overridden with a custom malloc replacement, such as dlmalloc, Hoard allocator, jemalloc or my own VMem allocator. Whilst all of these allocators have different performance characteristics, they all have some things in common, and it’s useful to have a working knowledge of what they are doing under the hood. Whatever you choose in your game, whether you stay with malloc/free or use a malloc replacement, if you are programming in C++ there are certain things that you should know about memory management.

A memory manager usually consists of at least 3 types of heaps:

  • Fixed Size Heap – used for small allocations
  • Coalesce Heap – used for larger allocations
  • Large Heap – usually handled directly by the system

 

Figure 1: A conceptual view of a simple memory manager

 Figure1

 

Fixed Size Heap

The fixed size heap consists of a number of fixed size allocators (or FSAs for short). They are called fixed size because each allocator only handles allocations of a specific size. So, for example, there may be separate FSAs for each allocation size from 0 – 256 bytes. When you ask for an allocation the memory manager will check if it is in this size range; if it is, it will select the relevant FSA and use that to allocate the memory.

The number of FSAs in a memory manager varies dramatically. Some memory managers only have FSAs from 0 – 128, others from 0 – 512, and some don’t have them at all. There are diminishing returns of going over 512 simply because there are usually not that many allocations of each size and the FSAs become wasteful of memory. However, if you know that you are making many allocations all of the same size and you suspect they are not going through an FSA, it is probably a good idea to route them through your own allocator.

Figure 2: FSA sizes. Note that the spread might not be even (and on 64bit systems each FSA size will be a multiple of 16 to ensure alignment).

 Figure2

The benefits of FSA heaps are twofold. They are fast and they eliminate spatial fragmentation. I will discuss fragmentation in more detail later, so for now let’s focus on the speed. The way an FSA works is that it allocates memory from the system in large blocks at a time, for example one 4K system page. It then splits that block up into equal sized slots. These slots are linked into a free list. When this allocator is asked to allocate memory, all it has to do is to take one of those slots off a free list. When freeing it simply pushes the allocation back onto the free list.

Figure 3: Example of an FSA page

 Figure3

You may be thinking this is oversimplified, and you’d be right. Finding out where an allocation came from is a non-trivial problem. It’s too slow to ask each allocator if it allocated the allocation, so memory managers employ various techniques to get around this, such as storing information in each allocation, or comparing addresses against reserved page ranges. Freeing memory is often much slower than allocating memory for this reason; one of the differentiating factors between allocators is how they solve this problem. This is too detailed to go into here, but the take-away fact is that freeing memory is often slower than allocating it.

FSAs work best when there are many high frequency, short-lived allocations. In a large game that makes use of stl it isn’t uncommon to have around 40,000 small allocations per second. FSAs also work fine for longer-lived allocations as long as they are allocated before anything else. Mixing short- and long-lived allocations will lead to temporal fragmentation. FSAs are essential because they are very fast and eliminate spatial fragmentation.

Coalesce Heap

Coalesce heaps consist of one of more coalesce allocators. The coalesce heap will handle all allocation sizes that are not handled by the FSA heap. A coalesce allocator is very different from an FSA. For a start, a single allocator can return allocations of different sizes. A coalesce allocator will first request a large region of memory from the system, say 32MB. The allocator keeps track of the allocated and free ranges of memory in this region. When an allocation is requested, it searches for the best free range to use. If it finds a free range that fits the allocation exactly, it marks the entire range as allocated. If not, it splits the free range into an allocated range and free range. A big differentiating factor between memory managers is how they decide on the best free range to use and this has a big impact on how well they limit fragmentation.

The coalesce allocator gets its name from what it does when it frees allocations. When an allocation is freed it marks the range as free. If the next consecutive range is already free, then these two ranges are merged, or coalesced, into one large range. Similarly, if the previous range is free it coalesces with that range. This coalescing is important because it prevents the free space from fragmenting, and means that it is less likely that large allocations will fail when there is memory pressure. This coalescing can happen immediately or can be delayed, but this is an implementation detail; there are benefits and drawbacks to both.

Figure 4: Example of a coalesce range and what happens when a range is freed

 Figure4

 

Large Heap

Some allocators will use coalesce heaps for large allocations, whilst others simply fall back to the system allocation functions (e.g. VirtualAlloc/HeapAlloc). Because there are usually fewer large allocations, and because they will not fragment memory, large allocations are not a concern.

 

Fragmentation

Memory fragmentation is an important consideration for all games, especially those on a platform with fixed memory. Fragmentation is basically the measure of unused memory that a memory manager has committed, that for various reasons probably won’t be re-used. Over time, the memory used becomes increasingly spread out, and the same amount of allocated memory takes up more and more system memory because of all the wastage. You may find that even though you have enough free memory to make an allocation, there isn’t a single free block of memory that is big enough to hold that allocation, and so the allocation fails. If you are not aware of fragmentation, the first you might know of it is when your game fails a long soak test a few weeks before shipping; certainly not the best time to find this out. It is generally accepted that most memory managers will plateau at 10% loss to fragmentation, but I’ve seen plenty of instances where fragmentation was significantly worse than this.

There are two types of fragmentation: spatial and temporal.

Spatial Fragmentation

Spatial fragmentation happens in coalesce heaps. If the free space that the coalesce heap decided to use for an allocation isn’t exactly the right size, there will be an offcut. If this offcut is small enough it will be referred to as a fragment, and the important thing to realise is that unless the allocation is freed and the fragment is coalesced, the fragment won’t be re-used.

 

Figure 5: Spatial fragmentation

 Figure5

Imagine many thousands of allocs and frees and you can easily see how lots of allocation sizes might not quite fit and you are left with many small fragments of memory. Because allocations are usually freed in a random pattern, this fragmentation usually increases over time. In theory these fragments could be re-used because the allocation that caused the fragment could be freed, the fragment would be coalesced back into the free range and an allocation of the perfect size could come along and use the entire range. However, as we know, entropy is a fundamental rule of reality and this rarely happens.

It’s tempting to say that we should fill in these fragments using the small allocations instead of using FSAs, but consider how memory would look if we did that, and then freed the larger allocations. Memory would become peppered with small allocations which would prevent us from making larger allocations. Coalesce heaps fragment easily, and combining small and large allocations is a bad idea.

It’s easy to see how FSAs eliminate spatial fragmentation. Because each allocator only deals with allocations of one size, no fragments are ever created. You might be wondering why we don’t just use FSAs for everything; as mentioned earlier there are diminishing returns as the allocation sizes become larger, because there are fewer types of allocation of each size.

Temporal Fragmentation

Temporal fragmentation affects both FSAs and coalesce allocators. It is caused by the mixing of short-lived allocations with long-lived allocations. A simple example illustrates this: imagine a slew of small short-lived allocations, allocated by one of the FSAs. The FSA allocates a new page from the system, chops it up and returns the allocation slots. However, in the middle of these allocations is one allocation that is long-lived. When all of the short-lived allocations are freed, the long-lived allocation remains, meaning that the page can’t be released back to the system. It will be left with a mostly empty page of memory that might not be re-used.

Figure 6: Temporal fragmentation

 Figure6

Memory managers can’t do much about temporal fragmentation because they have no knowledge of how long each allocation will live for. This means it’s down to us to limit this in the best we can. The biggest culprit of temporal fragmentation is strings. Strings are usually small enough to go into the FSA heap, they are often long-lived and they are often created at random times throughout the game’s lifetime. For example, a level load will often result in a spike of memory. If a long-lived string happens to be allocated at the peak, you may find that the memory doesn’t return to its previous level. The easiest way around this is to route all string allocations through their own allocator.

 

Best Practices

Armed with this knowledge about the internals of a memory manager, we can come up with some guidelines on best practices.

  • Different sized allocations go through different heaps and so have different considerations. Small allocations go through the quick low fragmentation FSA heaps, larger allocations go through slower coalesce heaps. Small allocations are assumed to be high frequency and short-lived. Larger allocations are assumed to be low frequency and long- lived.
  • Small allocations are very fast and there is safety in numbers. Some programmers are so scared of allocating any memory they will use custom allocators for everything, not realising that a small allocation is simply popping a value off a free list. Making all small allocations go through a single general purpose allocator is usually more efficient than having lots of fixed heaps. Only use your own allocator if you are doing lots of allocations of the same size and lifetime, such as in a particle system.
  • Avoid large short-lived allocations. The coalesce heap will fragment very quickly with short- lived allocations. Use a custom allocator or scratch area of memory for these.
  • Freeing is much slower than allocating. Keep this in mind when deciding when to free memory.
  • Try not to mix long- and short-lived allocations. This is often easier said than done, but a simple solution is to use a custom allocator for small allocations that you know are long-lived or never get freed. Similarly, use a separate allocator for larger, more frequent allocations that you know are not going to stick around.
  • Put a custom allocator behind your string class. Strings are almost always the biggest cause of all types of fragmentation.
  • Coalesce heaps suffer from both spatial and temporal fragmentation. Coalesce heaps are used for larger allocations, and it is expected that the allocation frequency will be lower. Coalesce heaps can fragment very badly. Allocating from coalesce heaps continuously every frame is probably a bad idea. Coalesce heaps are great for packing together varying sizes of allocations, but are poor when allocation frequency is high. Ideally, most large allocations would happen during level load, and would not be touched again while the game is running.

It is better to do certain things up front, such as putting custom allocators behind your string class and particle systems. There are some things that should always be avoided, such as high frequency/short-lived large allocations. Other than that, it is best to periodically profile your game to see what needs attention. A good profiler will be able to tell you where your memory is going, the time spent in the allocators, allocator overhead, and loss due to fragmentation.

Optimising memory usage is a difficult task. It’s good to have general tips in mind whilst programming, but it’s also useful to have a better understanding of memory managers for profiling and fixing memory related problems. I hope this blog has shed some light on that black box of memory allocation.


Related Jobs

Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States
[04.24.14]

Principal / Lead Rendering Engineer
Gearbox Software
Gearbox Software — Plano, Texas, United States
[04.24.14]

Graphics Programmer
Turtle Beach
Turtle Beach — San Diego, California, United States
[04.24.14]

SENIOR C# SOFTWARE ENGINEER
SOAR Inc.
SOAR Inc. — Mountain View, California, United States
[04.24.14]

Unity 3D MiniGame Programmer (and Designer)






Comments


Nathan Mates
profile image
Back in the PS2 era, our memory manager could be passed a flag as to whether it was a known-temporary allocation, or more permanent. Permanent allocations were skewed towards the bottom of a heap area, temps went to the top end of that heap. This was heavily used in level loading code, which involved a bunch of temps that went away relatively quickly, and we didn't want those holes in the memory footprint.

Granted, every memory allocation scheme can be hurt by really pathological allocation patterns, but if it's not too hard to have an 'allocate high' flag, that would help. These days, just about everything uses something based off of dlmalloc (public domain) or jemalloc. Wish those two had an allocate high flag.

Stewart Lynch
profile image
Hi Nathan,

Yes, Iíve seen this technique used in the past. You kind of get two allocators for the price of one with the benefit that each will resize to use up the available space. Itís a good technique for a single allocator with a fixed memory range. Itís not so applicable to a memory manager that contains multiple allocators though. The other thing to keep in mind is that most allocators dynamically resize their range, so there is no Ďtopí memory. You could have a separate temporary allocator for each allocator, but the overhead for this would be fairly high. It might be beneficial in some cases though.

The reason I didnít add a flag to VMem is that most games I work on make millions of allocations, tagging each one would be almost impossible. But it could possibly work for simpler games. If you want to modify VMem it should be pretty easy to pass in a flag and set up some different allocators, itís very modular. Let me know if you would like to try this.

John Woznack
profile image
Good write up.

I did nearly this exact memory management for one of the video games I worked on. It had many "FSA" blocks for the small-medium allocations, and defaulted to one general heap ("coalesce" as you put it) for the large allocations.

The 3 differences that I did were:

1. I allocated one large "temp buffer" up front for the game engine to use for large, temporary allocations. (Things like reading in a texture in a .tga format so it could be converted it to platform-specific format.) Obviously its use had to be tightly controlled for thread safety.

2. For the new/new[] calls, I wrote a tiny assembly routine per platform that took the allocation size and computed which FSA to use. For the delete/delete[] calls, I wrote a tiny assembly routine per platform that took the memory address being freed and did a very quick memory address comparison/lookup to determine which FSA it "came" from. Those two tiny functions actually made my custom memory manager faster than the native heap malloc and free calls on each platform!

3. During a "profiling" release-candidate run, I could disable the FSAs and default everything to the system heap and then collect stats on the size, usage, and timing of the allocations. That allowed me to create a bell-curve of allocation size usage that I could then use to pre-allocate the FSAs for each game level for the gold master. (I was extremely pleased with the final performance!)

I ended up creating an array of FSAs of sizes ((2^b)/4) for b = [4~16]. It's amazing to discover just how many tiny allocations happen, especially with C++ code that even the software engineers themselves didn't realize caused such small, temporary allocations.

Stewart Lynch
profile image
Hi John,
Thanks, glad you like it.

1. Yes, definitely a good idea to have a temp buffer for things like this.
2. Yes, working out which allocator to use or free from can be quite expensive, so itís worth spending the time optimising these. In VMem I use a lookup table for deciding which FSA to allocate from. I just create a table with an entry for each possible FSA allocation size that maps to a FSA pointer. For freeing I simply test if it is in the FSA virtual memory address range, if it is I clamp the address to the start of the 4K page which holds a pointer back to the FSA. In both cases itís just a few operations.
3. Yes, itís good to balance the allocators by profiling what is actually going on. In the past Iíve found that some FSAs are getting hit very hard so Iíve sectioned them off from the rest of the system to avoid fragmentation. I donít worry too much about balancing my FSAs because they can all dynamically resize themselves, so I can just chuck as many allocators in there as I want with negligible overhead. If I see that some FSAs are quite sparse I may combine them with another one though.

Chris Nash
profile image
Surely you should be using C++11, which bypasses the need for "new" and "delete".

Stewart Lynch
profile image
New and delete are still called under the covers even if you don't call them explicitly, so everything I've said applies equally to C++11.


none
 
Comment: