Ones and Zeros

The root of digital logic is being able to differentiate between something and not something. This little seed has basically built the entirety of the information age. You’ve probably heard it before: a computer is just ones and zeros, but let’s explore this a bit (ha) in the context of programming and data.

Bits of Data

Data ultimately comprises of ones and zeros (called bits) stored in memory, and we can manipulate how it is represented using different types in a programming language. Let’s look at an example in C.

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t data = -1;

    // Print data as a signed 8-bit decimal number.
    printf("int8_t:  %d\n", data);

    // Print data as an unsigned 8-bit decimal number.
    printf("uint8_t: %d\n", *(uint8_t*)&data);
}
int8_t:  -1
uint8_t: 255

Let’s break down what’s happening here.

First we create a variable data that is 1 byte (8 bits) large and store -1 into it.

int8_t data = -1;

But what is -1? Well, most computers represent -1 in ones and zeros as all 1s in a thing called two’s compliment. That’s sort of out of scope for this post, but the gist is that representing negative numbers this way makes binary addition and subtraction all make sense.

Anyway, since the variable is 1 byte large, the bits will be 8 ones: 11111111.

We then tell printf to print this data in two different ways: as a signed integer and as an unsigned integer.

printf("int8_t:  %d\n", data);
printf("uint8_t: %d\n", *(uint8_t*)&data);

The whole *(uint8_t*)&data is a bit strange if you’re unfamiliar with C, but here’s what it’s doing (from right to left).

&data       : Take the address of data (this is a pointer)
(uint8_t*)  : Ignore whatever type that pointer was pointing to, you're
              now pointing to a uint8_t type
*           : Get the actual data that we are pointing at as a uint8_t

So that’s cool and all, but like, why not just make a new variable and assign it the value of data? Well, the point I’m trying to make is the underlying data is the same. There’s no hidden conversions happening. We are only changing how the program interprets the ones and zeros.

Floating Around

Let’s take this a step further. Did you know float in C is 32 bits?

#include <stdio.h>
#include <stdint.h>

int main() {
    float data = 1337.0;

    printf("float:    %f\n", data);
    printf("uint32_t: 0x%x\n", *(uint32_t*)&data);
}
float:    1337.000000
uint32_t: 0x44a72000

The keen reader will notice that 0x44a72000 is not equal to 1337 at all. Or, it is, but in a different interpretation? You see, 0x44a72000 is the IEEE floating point representation for 1337.0. Those are the underlying ones and zeros for that number, so while those specific underlying ones and zeros are unequal to 1337, taking those ones and zeros as an IEEE floating point number means it is equal to 1337.0.

Wow that made no sense at all. Here, just stare at this until you become one with the universe.

uint32_t data = 0x44a72000;
printf("%s\n", 1337.0 == *(float*)&data ? "true" : "false");
true

This manipulation of floating points by its underlying bits is probably most famous from the fast inverse square root function in Quake.

i  = * ( long * ) &y;           // evil floating point bit level hacking
i  = 0x5f3759df - ( i >> 1 );   // what the fuck?

All this to say, how we interpret data is kind of arbitrary. Standards were made to facilitate interoperability and communications. ASCII is one such standard developed by Bell Labs in the 1960s.

Actually, this “interpretation is in the eye of the beholder” is the idea behind deniable encryption.

Deniable encryption makes it impossible to prove the existence of the plaintext message without the proper decryption key. This may be done by allowing an encrypted message to be decrypted to different sensible plaintexts, depending on the key used. This allows the sender to have plausible deniability if compelled to give up their encryption key.

Okay, it’s not exactly the same, but the gist is that a collection of bits can be interpreted in more than one way. Speaking of which, I’m just going to leave these floats here..

0x1.dcde86p+79
0x1.e8c2e4p+107
0x1.e8c2d8p+83
0x1.e6dcdep-61
0x1.e8d04p+105
0x1.74e6ep-33
0x1.d2da5ep+71
0x1.d0c2c6p-35
0x1.5eded2p+67
0x1.cae6e6p+105
0x1.da5ee6p+115
0x1.cae8e6p+101
0x1.e65af2p+95
0x1.caecd8p+73
0x1.dce05cp+79
Hint
{% raw %}
#include <stdio.h>

int main() {
    char *msg = "Wouldn't you like to know, weather boy?\0\0\0";
    for (char *c = msg; *c != 0; c += 4) {
        printf("%a\n", *(float*)c);
    }
}
{% endraw %}

Conclusion

Understanding the underlying ones and zeros of data in your program isn’t always important, but it is certainly nice to know for high performance or memory-constrained systems. Manipulating the interpretation of data like this is weirdly one of my favorite stupid things to do in C. I love that you can peel back the veil and see through the matrix at what the data really is, however pointless that might be.

P.S. After writing this post, I realized a C union might’ve been a better choice for demonstrating “the data is the same and the interpretation is different” because that’s literally the whole point of union!