A little bit extension on how bytes are stored (material of 61C)
Create by Yunhao Cao (Github@ToiletCommander) for Fall2021 CS61B.
OK So I found this very very interesting in my CS61C class and I have worked with TCP communication for a long time since I am a very heavy full stack web developer and a lot of the early-on HTTP protocols are built around TCP, so I hope this is fun for you to 😉
So we mentioned the difference between bit and byte, being that a byte is just simply made up of 8 bits. And we know that MSB(Most Significant Bit)and LSB(Least Significant Bit) is the leftmost bit of a bit representation and the rightmost bit of a bit representation.
To set up for our new idea of MSB and LSB in unit of bytes, we need some new example. if we have a hexidecimal representation of data of decimal 1048230495
, 0x3E7ABA5F
, we can also express the number in binary as 0b0011.1110.0111.1010.1011.1010.0101.1111
if we represent it as a 32-bit integer in memory.
Now for unit of bytes, it is very easy to use hexidecimal representations because each two hexidecimal characters represents a byte. And here we will define that the MSB(Most Significant Byte) being the leftmost two characters in the hexidecimal representation, and LSB(Least Significant Byte) being the rightmost two characters in the hexidecimal representation. So in the example above, we have 0x3E
as our MSB and 0x5F
as our LSB.
We know the concept of MSB and LSB for bytes now, but why does it link to how data is stored in memory? Well here we will introduce the concept of endianess.
We know that in memory we have sequentially numbered addresses. For example in a 32-bit(4 byte) int array x, we have x[0] starting at some position i
and we know that if memory address is incremented by 1 for every byte, then x[1] would have position i+1
. This will be important as memory addresses lays very great foundation of the understanding of CS61C concepts.
Now there are two types of endianess in modern computers, being little-endian and big-endian, and the majority of the electronics we use are little-endianed but the way we use network protocols is usually big-endianed. The endianess determines which one of MSB(Most Significant Byte) and LSB(Least Significant Byte) is stored when storing a primitive type that is supported in the CPU instruction ISA. For a little-endianed machine, the LSB is stored in the smallest address and the MSB is stored in the largest(or furthest) address. And for big-endianed machines the MSB is stored in the smallest address and the LSB is stored in the largest address.
But don't be confused! Although the order of "bytes" are stored differently, the order of bits in each byte remains unchanged! So let's finally take a look at our example, 1048230495
, 0x3E7ABA5F
, or 0b0011.1110.0111.1010.1011.1010.0101.1111
, and let's say we store it in position i.
If this was stored on a little-endian system, we would have a memory layout like this:
Mem Addr | i | i+1(bytes) | i+2(bytes) | i+3(bytes) |
Hex | 5F | BA | 7A | 3E |
Bin | 0101.1111 | 1011.1010 | 0111.1010 | 0011.1110 |
But if it was stored on a big-endian system, we would have a memory layout like this:
Mem Addr | i | i+1(bytes) | i+2(bytes) | i+3(bytes) |
Hex | 3E | 7A | BA | 5F |
Bin | 0011.1110 | 0111.1010 | 1011.1010 | 0101.1111 |
This would get more fun as you use differently typed pointers to access the same memory locations!
And if you like this kind of things, CS61C is absolutely the place to go as it teaches you things like memory layout, processor ISA, processor implementation, parallel computing, etc. It's the course that I enjoyed the most among all of the courses I've taken this year! (among CS70, 61B, 61C, and EE16B)