Data Alignment

When working in C, you will sooner or later use structs. This is a very handy construction, but it has hidden pitfalls. One of them shows up when the sizeof your structure changes. Why is that so?

It all comes down to something that is called data alignment. This is because computers today do not read individual bytes from memory, instead they read a word. I.e. a number of bytes that work well for the given CPU architecture. As you can tell, this differs between different CPU architectures.

So, how does this affect the size of your structs? Take the following struct. Lets, for the sake of this example assume that sizeof(char) returns 1, sizeof(char*) and sizeof(int) returns 4.

struct MyStruct {
char foo;
char *bar;
int baz;

The naive expectation of sizeof(struct MyStruct) would be 1+4+4 = 9, but 12 is just as likely. Look at the illustration to the right. There you can see that the compiler will try to align integers and pointers to word boundaries. This leads to a 3 byte void between the char and the pointer. This is because of how the CPU reads and writes data. As the interface between the memory and CPU is word orientated, reading a non-aligned word means reading two words, then masking, shifting and combining the results to the word requested. Reading an aligned word is more straight forward.

In some settings, space is more important than speed. Then one can rearrange the structure to optimize both size and performance. This, combined with the #pragma pack directive, gives you full control of the data alignment.

Adding a new character to the structure will demonstrate this more clearly. Reading a single char always means reading a single word, then masking and shifting to reach the requested byte. This means that characters are not aligned. Adding the new struct member to the end of the structure adds a byte to the structure. Adding it at the front, simply utilizes the void used to align the pointer, adding more data without using more space.

In these examples, we’ve looked at a CPU that works with 32-bit words. Today, 64-bit machines are quite common, as are 16-bit machines – at least in embedded settings. How does this affect alignment? The answer is – it depends on the CPU.

This entry was posted in Programming. Bookmark the permalink.

5 Responses to Data Alignment

Comments are closed.