It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Alexandre Macris and
Pascal Urro

Gamasutra
April 9, 1999

Letters to the Editor:
Write a letter
View all letters


Features

Leveraging the Power of Cache Memory

Contents

Introduction

Example 1: A 16-bit floating-point Z-buffer

Example 2: Managing Big Triangles with Large Texture Maps

When hardware engineers presented us software developers with cache memory for the first time, essentially they gave us a black box. "Don’t worry about what’s inside of it," they assured us, "just trust us that without changing your programming habits, your programs are going to run much faster!" That was the goal: free software performance enhancement. While it worked fine for a while, over the years the rift between processor speed and memory access time became greater, and it became obvious that by opening the black box of cache memory and exploring the contents within, we’d find solutions to the widening performance gap.

The first steps into demystifying cache memory were modest, and the first tips that resulted from these explorations were vague. For instance, some of the first tips about making the best use of cache memory were non-specific suggestions such as, "avoid random memory access" and "keep your data local". With the appearance of dissociated data and code caches, new suggestions arose, like "keep your data and code strictly separated." The "allocate on write" technique came with its own set of advice. These days, the manual prefetch is in vogue, which takes advantage of dedicated instructions introduced with the new generation of Intel, AMD, PowerPC, and Mips processors. Unfortunately, these basic tips aren’t really satisfactory if you are serious about trying to optimize your game. As a result, the major new code optimization challenge facing game developers today has cache memory optimization.

Simply blindly applying vague suggestions like those to your game, without knowing their basis for operation, will not give you fully optimized code. In fact, it’s easy to find instances where strict code optimization and cache memory optimization conflict. Let’s look at an example.

Assume that you are computing a true-color image. You have the choice of representing your image using either a 24-bit RGB data structure or a 32-bit structure using an idle byte. Your first reflex would probably be to select the second solution for one simple reason: all your memory access will be 4-byte aligned, avoiding the three (at least) processor cycles needed to handle misaligned access. Usually, that’s the right choice, but if you’re using cache memory, it’s not always the way to go.

The technique you should use is implementation dependant. If you’re only going to look at the data once, a 24-bit structure may be more efficient than the byte-aligned 32-bit structure, but the opposite is true if the data is being accessed multiple times. Why? Because on the first pass, the image data is absent from the cache (assuming that we’re working on a processor without a prefetch instruction). Accessing the data from RAM and fetching that data into cache are expensive enough operations as to make the misalignment penalty insignificant. What we’re worried about during the first pass is not the shape of the data per se, but the raw amount of data that has to be moved. So to reduce the amount of data we need to fetch, we should use a 24-bit data structure as opposed to the 32-bit alternative.

At this point let me clarify something about optimizing your game for cache memory. The main handicap of cache memory is its lack of determinism and the difficulty we encounter when trying to tune its performance. Of course, Pentium event counters provide some useful help in this regard, but not enough to strictly optimize your code. Therefore the only real solution is, as usual, hard work. To that end I recommend purchasing a good book on the subject, such as The Cache Memory Book by Jim Handy (Academic Press, 1998).

This hard work will be rewarded, however. To convince you that the results of properly handling cache issues justify the added programming complications, let’s take a look at two detailed, real-life examples. The first example will convince you of one simple thing: accessing uncached data is expensive. The second (and more interesting) example will show you that sometimes changing a very small aspect of your game can solve a huge performance problem, provided you look deeply enough into the cache structure. There isn’t much actual code in these examples – they are primarily meant to illustrate that understanding and optimizing your data structures for cache can result in significant performance improvements.


Example 1: A 16-bit floating-point Z-buffer


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2003 CMP Media LLC

privacy policy
| terms of service