Your Ad Here

Binary, Decimal, Hex, and Ascii

Inside the computer, everything is binary. Numbers use binary code. Floating-point numbers use a different code from integers, but it is still binary. Characters use a binary code called ASCII code. ASCII stands for American Standard Code for Information Interchange. Each character is represented in the computer by an 8-bit binary number (a byte). Which number to use for each character was decided by a committee. Pretty much everyone uses the same code. That is why almost any printer can be used on almost any computer and you will ger correct results. I have made a table of ASCII codes that you can use for reference.

Outside the computer, where humans see the data in programs and in data files, we may use decimal to represent numbers, or we may use hexadecimal (base 16). We may use characters. We almost never use binary representation. Data are always converted to binary when they are put into the computer. You should have learned to convert decimal or hexadecimal to binary somewhere. If the data are characters, then the conversion is done using the ASCII code. Most computers now use "2's complement" arithmetic with negative numbers, and I made notes on that. In any case, when it is in the machine, it is simply binary, and the computer doesn't know the difference between any of these kinds of data.

Hexadecimal is used because conversion from hexadecimal to binary or vice versa is extremely simple. The hexadecimal digits are 0123456789abcdef, and they correspond to the binary numbers that represent 0 to 15. To convert hexadecimal to binary, you simply replace each hexadecimal digit by the corresponding 4 binary digits. To convert binary to hexadecimal, mark off sets of four binary digits starting at the binary point and going in both directions. At each end, add enough zeros to get complete sets of four binary digits. Then replace each set of four binary digits by the corresponding hexadecimal digit. For example, hexadecimal 3e8 corresponds to 0011 1110 1000.

When you process data, generally you don't mix the data up. If you are processing characters, numbers shouldn't be mixed in with them. Generally, when you add two numbers, they should both be integers or they should both be floating point. Some computer languages keep track of the types of all data and don't allow you to mix data (except possibly under very controlled circumstances). Such languages are called strongly typed. They help you avoid many common errors, but they also make it difficult to do operations where you really want to process two different types of data together. Java is quite strongly typed, in order to make Java as immune to errors as possible. C is somewhere between machine language, with minimal types, and Java, which is quite strongly typed. In C you are allowed to do many things which are not allowed in Java.

In your C program, you may write constants in decimal, e. g. 65. You may write constants in hexadecimal (hex for short) e. g. 0x41. You may write a character constant, e. g. 'A'. The compiler will convert each of these to binary in preparation for putting them into the computer. These three examples all result in the binary number 01000001. Suppose you have declared "char n;" You may write "n = 65;" or "n = 0x41;" or "n = 'A';", and all three of these statements do exactly the same thing--they result in giving n the binary value 01000001. Suppose you have declared "int k;". Then you may write "k = 65;" or "k = 0x41;" or "k = 'A'; and all three statements give precisely the same result. In almost all present day computers, an int is 32 binary digits. The value 01000001 will be put in the rightmost 8 binary digits and the rest will be filled with zeros. Thus k will contain 00000000 00000000 00000000 01000001. Let me repeat--all the data in the computer is binary. Decimal and hex and character representations appear in our programs, because they are easier for humans to use. The compiler always converts them to binary.

In C, you can also write such things as k = 'A'+3; or n = 'A'+3. The result in either case will be the binary representation of 65+3 = 68, which is the ASCII code for the letter D. You can read data from a data file in any of those formats and get it converted to binary. You can also print binary data from inside the computer in any of these formats. You cannot easily print in binary, because rarely does anyone want to look at binary. To print in binary, you have to write a short program.

 #include <stdio.h>
 int a, b, c, d, e;
 void main()
 {
     int i;
     a = 65;
     b = 0x41;
     c = 'A';
     d = 'A'+3;
     printf("a = %d b = %d c = %d\n", a, b, c);
     printf("Ascii %c, decimal %d, hex %02x\n", d, d, d);
     printf("Here is d in binary: ");
     for(i = 7; i>=0; i--) {
         if((1<<i)&d) printf("1");
         else printf("0");
     }
     printf("\n");
 }

 Results:

 a = 65 b = 65 c = 65
 Ascii D, decimal 68, hex 44
 Here is d in binary: 01000100

Note that all the ascii codes are all less than 128, so they require 7 binary digits. When a byte is used to encode them, the first bit is always zero. Some devices, like printers, have graphics defined for the codes greater than 127, but these are not standardized, so you can expect different results on different printers. I consider the codes 128 to 255 (that start with a binary 1) to be unprintable. Then the characters from 32 to 126 inclusive are printable, and those 95 bytes are the only printable characters. (A space is represented by decimal 32--I consider that to be printable even though you can't see the result of printing a space.)

The codes from 0 to 31 are "control characters" designated by ctrl-A or ^A, for example. Typically, you can get ctrl-A by holding down the ctrl key and hitting the A key on a keyboard. These are not printable characters. The code for ^A is gotten by subtracting 64 from the code for A, so ^A is 1, ^B is 'B'-64 or 2, etc. Note that ^G and \a, for example, represent the same value, 7. The escape codes \a, \b, \t, . . . are defined on page 38 of K & R. The most common of these are \n for new line, \r for return, \t for tab, and \b for backspace. These are considered to be unprintable even though the printer knows how to deal with them. \t is the same as ^I, \n is the same as ^J, and \r is the same as ^M.

If you simply print a file that has unprintable characters in it, you cannot tell from looking at the resulting printout exactly what is in the file. Some unprintable characters, like tab, affect what the printer does, and some of them are simply ignored by the printer. (The same is true of displaying characters on the computer display.) Even though you can't see those characters, they may cause a compiler to give incorrect results--usually a syntax error. There are binary files, also, executable files for example, that simply have no relation to the ascii codes, and if you try to print such a file, the result is totally unreadable, although some of the bytes will, by coincidence, have values in the range of printable characters.

It is frequently important to know exactly what is in a file. obviously, then, you can't simply print it. There are two ways of handling this situation, using a filter that replaces unprintable characters by escape sequences, or using a file dump program. These are explained at the end of the Assignment 2 writeup.