Data Oriented Design

This article describes the idea behind Data-Oriented Design (DOD for short). It is a useful way of thinking about design of data processing engines, with DOD being a favorite of many modern game engine designs. Why? Because it scales well to having multiple cores perform different types of work on similar structures.

Reading Material

When getting started with DOD, it is useful to review the existing literature out there on the topic. Here a short summary of “essential” reading 🙂

A Practical Example?

While exploring DOD, we didn’t find many simple C examples of DOD, and why it might actually be useful to us. The solution was to get writing some code – and that’s exactly what we will guide you through here today. Nope – no code example – you gotta type it to understand it 🙂

So were going to build a small application that can display a number of pixels on screen. What two “Components” does a pixel have? (In OOP these might be called “properties” or “attributes” of an Object)

  • Location
  • Color

These two components allow us to do DOD very nicely – we can have two lists for processing, one to update locations, another to update colours. Since we know these two processes are not dependant on eachother (location and colour will be assinged randomly – independantly of eachother), we can safely run these two operations in parallel (two threads, two different CPU cores). In a typical OOP design, the fact that the Class will represent both colour and location makes this more difficult.

Data Independance

Why is the independance of the data relevant? Because we can gaurantee that when the colour update core writes to the colours it will not change the location, and vice versa. The following code snippet shows how we can create two “components” for DOD, and combine them to represent an “object” (using OOP terms 🙂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Component “location”
struct location_t {
    uint32_t x;
    uint32_t y;
};
// Component “colour”
struct colour_t {
    uint32_t r;
    uint32_t g;
    uint32_t b;
};
// Object built up of “Components” only
struct pixel_t {
    struct location_t loc;
    struct colour_t col;
};

Of course, to make this scale better, we should loop over all items that need their location updated in one loop because this amortizes the I-cache misses across all the items. So re-implement the above functions to accept a TailQ [1] list, which can then be iterated for all items that currently have a “dynamic” location or colour.

There is a little book-keeping involved in DOD – everything that needs to be updated should be added to or removed from lists as appropriate. The advantage is large though: not every object has to be checked for its “needs_update” flag, which saves A) pulling the cache line for the object B) a branch (mis)-predict for no good reason.

Conclusion

DOD is a nice design pattern for data processing applications, particularly when processing stages can be disconnected and split into “components” which can be done in parallel. It is difficult to understand at first, because the way of thinking about “objects” in code changes. The power of thinking in terms of smaller pieces of “work” than need to be performed on data is great for high-performance code that should not stall on cache-misses, and needs to run in parallel.

Footnotes

[1] TailQ: http://ideone.com/D67BHW