DATA REPRESENTATIONS

 

The purpose of a computer is to process, store, and communicate information. The power of a computer is in its ability to process, store, and communicate huge amounts of information rapidly and reliably. Of course "information" is an abstraction; what a computer actually processes, stores and communicates are electrical quantities which we humans may use to represent information. In a digital computer, electrical quantities (voltage levels) are used to represent two logical states, 0 and 1. Each individual logical value (0 or 1) is referred to as a bit (binary digit). A sequences of bits is a binary string. Binary strings may be used as codes for many different types of information. The next section provides an example as to how this may be done.

            For the most part, we will be interested in 8-bit, 16-bit, and 32-bit strings. An 8-bit string is universally called a byte. The term word has various meanings,  but here it will refer to a 16-bit string. A 32-bit string is a double word. It is customary to number the bits in a binary string from right to left, beginning with bit #0, the least significant bit (LSB). The leftmost bit (bit  #7 of a byte, bit #15 of a word) is the high order or the most significant bit (MSB). An 8-bit binary code of 10101010 is shown in Figure 2.1.

 

 

(MSB)

bit #7

 

bit #6

 

bit #5

 

bit #4

 

bit #3

 

bit #2

 

bit #1

(LSB)

bit #0

    1

    0

    1

    0

    1

    0

    1

    0

 

 

FIGURE 2.1

 

 

2.1  A COMMUNICATION EXERCIZE

 

This section introduces an imaginary exercise involving you and three friends: Alice, Bill, and Chuck. Each of the participants has a copy of a certain list of four "yes/no" questions. Once each minute you are to select one of your friends and to ask him or her one of the questions from the list. Thirty seconds later, that friend is to communicate the answer to you. The problem is that each of you  is in a  separate room and all communication is to be accomplished through the use of electric signals using a 12 volt car battery, some electric wire and some voltage meters. Figure 2.2 represents one possible solution. A pair of wires is attached to the battery terminals with branches to each player. This wiring configuration is called the power bus. Two additional pairs of wires, the address bus and the data bus, also have branches to each participant. You may connect each wire at your end of the address and data busses to either of the power lines. In the figure, you have imposed 0V, 12V on the address bus (reading from left to right) and 12V, 0V on the data bus. Your friends have voltage meters enabling them to determine which voltage levels you have placed on the address and data bus lines. Alice, Bill, and Chuck also have the capability of imposing voltage levels on the data (but not the address) bus. Timing must be handled carefully so that no two participants try to put voltages on the data bus at the same time. This could cause a short circuit. In the figure, you have connected the data bus to the power supply so Alice, Bill, and Chuck have not.

 

 

 

 

 

 

 

 

 

FIGURE 2.2

 

 

            Now the address and data busses are each capable of presenting up to four different voltage combinations and these combinations can be treated as codes for information. The address bus voltage levels will be codes for "Alice", "Bill", and "Chuck" while the data bus voltage levels will be treated either as codes for questions by you or as codes for answers by your friends. Table 2.1 gives one possible coding scheme. Assuming that this scheme is used, Figure 2.2 shows you asking Bill question #3.

 


 

Voltage levels on bus

Interpretation of voltages on busses

Wire #1

Wire #2

Address Bus

Data Bus (question)

Data Bus (answer)

0 volts

0 volts

Alice

question # 1

"No"

0 volts

12 volts

Bill

question #2

none

12 volts

0 volts

Chuck

question #3

none

12 volts

12 volts

none

question #4

"Yes"

 

                                                            Table 2.1

 

            Remark

Although 0 volts and 12 volts are used in the above game, there is nothing special about those particular values. All that is required is two voltage levels (logical 0 and logical 1) which can be reliably distinguished. Using these logical values, the addresses provided by Table 2.1 are 00 (Alice), 01 (Bill), and 10 (Chuck). The question codes are 00, 01, 10, and 11. The answer codes are 00 ("no") and 11 ("yes").

 

            EXERCIZES

            2.1       What changes need to be made if: (a) your friend Don were to join the game, (b) you were to add an additional "yes/no" question to your list, (c) you were to ask "what color is your shirt?" where the possible answers are "red", "orange", "yellow", "green", "blue", or "purple"?

 

            2.2       What would be the effect on the game if we were to expand the address bus to 6 bits and the data bus to 4 bits.

 

 

2.2 INTEGER  REPRESENTATIONS

 

Computers compute. This requires a method of using binary strings to represent numerical values. Before discussing such representations, it is worth while thinking about the decimal system that we are all familiar with. The numeral '3039' uses the digits '3', '5', and '9', together with their positions in the string '3039'  (namely 9 in the 1's position, 3 in the 10's and 1000's positions and 0 in the 100's position) to encode the numerical value. Had we evolved with one finger on each hand instead of 5, we might use 1101 to represent the value we call 13, with 1 in the 1's, 4's, and 8's positions and 0 in the 2's position. In   base 2 system , each bit has a positional value that is exactly double the value of the bit on its right. The decimal (base 10) system and the  base 2 system are both positional in the sense that decimal/ binary digits are given weight according to their positions in the string in which they occur.           One distinction between binary representation systems used by computers and numeric representation used by people is that computer coding system is associated with a fixed size (number of bits); this may be a byte (8-bits), word (16-bits), double word (32-bits), or any other size. 

            A distinction is made between coding systems for signed integers (positive and negative) and coding systems for unsigned integers (all positive). A binary string may represent the number 255 when one system is used, but -1 when another system is used.

 

2.2.1 Unsigned Integers

sum from {i   CONG  0} to k { b sub i  2 sup i}

 

When binary strings (of a fixed length) are used to represent unsigned integers, they are treated as base 2 values. If  b0, b1, ... bk are bits, then the binary code  bkbk-1 ... b1b0  represents the integer

  

 

where 2i is the positional value for bit  bi.

 

            Example 2.1   What unsigned integer is represented by these 8-bit binary codes?

(a) 00101001 (b) 01000001

            Solution (a) 20 +23 +25 = 41.  (b)  20 +26 = 65.

 

            Example 2.2 Represent the unsigned number 69 as an 8-bit binary code.

Solution. The base-2 representation for 69 can be obtained as follows: Let Q=69. Successively divide Q by 2 obtain a quotient (the new value for  Q) and a remainder R (either 0 or 1.) Terminate this process when Q becomes 0.

                                                Q         R

                                                69       

                                                34        1(LSB)

                                                17        0

                                                 8         1

                                                 4         0

                                                 2         0

                                                 1         0

                                                 0         1

The values of R then form the base-2 representation of 69, beginning with the LSB and ending with the MSB. Including a sufficient number of 0's on the left produces the 8-bit binary representation for 69 is 01000101

 

            Example 2.3 What is the largest unsigned integer that can be represented by 8 bits?

Solution  255. (This corresponds to binary code 11111111).

 

 

2.2.2 Signed Integers

A portion of any signed integer representation must be used for sign information. Usually, the high order bit (MSB) is used for this purpose, with 0 denoting a positive value and 1 a negative value. For this reason, the MSB is often called the sign bit. Three binary representation systems for integers are sign-magnitude representation,  1's complement representation, and 2's complement representation.

           

2.2.2.1 Sign-Magnitude Representation

This easily understood system uses a sign bit (0 for positive, 1 for negative) followed by the base 2 representation of the magnitude of the number being represented. For example, an 8-bit sign-magnitude representation uses 1 bit for the sign and 7 bits for the magnitude of the number.

 

            Example 2.4    What is the 8-bit sign-magnitude representation for (a) 65 (b) -65?

            Solution  (a)  01000001. The sign bit is 0 (since the number is positive). The 7-bit magnitude is 1000001 (= 20+26). (b) The result is the same as in part (a) except for the sign bit. In other words, the answer is 11000001

 

2.2.2.2  1's Complement  Representations

 The 1's complement of a binary string is formed by changing all 0's to 1's and all 1's to 0's. In this system, a positive integer is represented as an unsigned integer with sign bit 0. A negative integer is represented by the 1's complement of the corresponding positive integer.

 

            Example 2.5 What is the 8-bit 1's complement  representation for (a) 65 (b) -65?

            Solution  (a) 01000001 (= 20+26).

 (b) 10111110, the 1's complement of the result in part (a)       

 

            Remark

            The 1's complement of a bit bi  can be calculated as 1-bi, so the 1's complement of 01000001 can be calculated as 11111111 -  01000001.

 

2.2.2.3  2's COMPLEMENT  REPRESENTATIONS

Before discussing this coding system, it is necessary to look at the operation of incrementing (adding 1 to) a k-bit binary string, obtaining a k-bit result.

 

            Example 2.6

            (a) 01010000 + 1 =  01010001

            (b) 10101111 + 1 = 10111111

            (c) 11111111 +  1 = 00000000

 

            The 2's complement of a k-bit binary string is formed by incrementing the 1's complement of the string. The 2's complement representation of a positive integer in this system is the same as the unsigned representation.  A negative integer is represented by the 2's complement of the unsigned representation of its magnitude.

 

            Example 2.7    What is the 8-bit 2's complement  representation for (a) 65 (b) -65?

            Solution  (a)  0100000, the same as sign-magnitude or 1's complement representation.  (b) To get the solution, find the base-2 representation of 65 (01000001), form its 1's complement (10111110), and add 1. Hence the solution is 10111111.

 

            Remark

            The 1's complement of the 1's complement of a k-bit binary string producing the original string. The same is true for the 2's complement of the 2's complement of a k-bit binary string. This is essential since the 1's complement/2's complement operation correspond to forming the negative of a positive or  negative number in the two representation systems and the negative of the negative of any number is itself.

 

2.2.3 Sign extension.

 Sign extension is the operation of converting the 2's complement representation of an integer using some number of bits to the 2's complement representation of the same integer but using more bits. This is done by simply repeating the sign bit as often as required to get the desired number of bits.

 

            Example 2.8 What is the 16-bit sign extension of (a) 01000001, (b) 10111110?

            Solution (a) 0000000001000001, (b) 111111110111110

 

            Example 2.9  Are the following 16-bit strings sign extensions of 8-bit 2's complement representations? (a) 1011101110001000 (b) 1111111101010101

(c) 0000000001010101 (d) 1111111110101010 (e) 0000000010101010

            Solution  (a) No, since the high order 8-bits are not all the same. (b) No, since the high order 8-bits are not the same as bit #7 (the sign bit of the least significant byte) (c) Yes, this is the sign extension of 01010101(the least significant byte) (d) Yes

(e) No.

 

            Remarks

(1) Using the sign-magnitude system, the negative of any number is formed by complementing the sign bit. The negative of 11000001 is 01000001. Using the 1's complement system, the negative is the 1's complement of the binary string. The negative of 10111110 is 01000001. Using 2's complement, the negative is the 2's complement of the string. The negative of 10111111 is the result of base-2 addition of 1 to 01000000, i.e., 01000001.

            (2) Using sign-magnitude or 1's complement representation systems, there are two representations for 0 (+0 and -0); -0 is represented as 10000000 using 8-bit sign-magnitude representation and as 11111111 using 1's complement representation. There is only one representation for 0 in 2's complement representation.

(3) Range of numbers:  The range of integers that can be represented in 8 bits and n bits is shown in Table 2.2.

 

 

 

 

Representation

     8 bits

      n bits

Unsigned

0  to  256

0  to 2n -1

Sign-magnitude

-127  to +127

-( 2n-1 - 1) to +( 2n-1 - 1)

1's complement

-127  to +127

-( 2n-1 - 1) to +( 2n-1 - 1)

2's complement

-128  to  +127

-2n-1  to +( 2n-1 - 1)

 

TABLE 2.2

 

(4) Arithmetic operations: Addition, Subtraction, Multiplication & Division are the most common arithmetic operations. In  2's complement representation, addition can be done as if  the numbers are unsigned numbers. Subtraction is implemented in terms of addition. i.e, X -Y is the same as X + (-Y). This representation has a benefit that subtraction can be done on the same hardware as addition.

            Multiplication and division are easier to implement in sign-magnitude representation. However, 2's complement is preferred to implement all integer operations add, subtract, multiply and divide operations. Some computers prefer sign-magnitude representation on floating point numbers as discussed in section 2.x.

(5) Interpretation of signed/unsigned: It is important to realize that the interpretation of binary data as a signed or unsigned integer is something that is done by you, the programmer. The computer does not know or care how data is to be interpreted by the programmer. It simply follows whatever algorithm is specified by its program in manipulating the sequence of bits. For example, we will see that a program may "add 1" to the binary value 11000011 resulting in the sum 11000100. This may be interpreted by the programmer as "1 + 196 = 197" or as "1 + -60 = -59".

 

 

2.3 HEXADECIMAL REPRESENTATION OF BINARY DATA.

 

A binary representation is tedious and difficult for humans to handle. For example the 16-bit value  1000100110101011 is difficult  for human mind to distinguish from other 16-bit values. A solution to this problem is to introduce a symbol to represent each grouping of four bits. This is referred to as hexadecimal code as shown in Table 2.3.

 

Hexadecimal

Binary

 

Hexadecimal

Binary

0

0000

 

8

1000

1

0001

 

9

1001

2

0010

 

A

1010

3

0011

 

B

1011

4

0100

 

C

1100

5

0101

 

D

1101

6

0110

 

E

1110

7

0111

 

F

1111

 

                                                TABLE  2.3

 

            Using this code the binary word 100010010101011 would be represented as 89AB, a much easier notation to grasp.

 

            Example 2.10 (a) What is the hexadecimal representation of 10110011?

(b) What is the hexadecimal representation of the six bit string 100110?

(c) What binary string is represented by 1AF3?

(d) What binary string is represented by the hexadecimal representation 1101?

            Solution  (a)  B3 as determined by the above table. B represents 1011 and 3 represents 0011. The solution to (b) is 26 since, for purposes of hexadecimal representation,   100110 is considered to be equivalent to 00100110. The solution to (c) is 0001101011110011 as determined by the above table. Similarly, the solution to (d) is 0001000100000001.

 

            Example 2.11  (a) Find the 8-bit 2's complement representation for -5. Express your answer in hexadecimal form. (b) Find the integer  whose unsigned hexadecimal representation is 1A.

            Solution  (a) is FB since the binary 2'S complement representation of -5 is 11111011. The solution to part (b) is 26. Since 1A is 00011010 which represents 26.

 

There is another approach to the solution of part(b) in the above example. Hexadecimal representations can be treated as base 16 numbers with digits 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F, where A,B,C,D,E,F  represent digit values 10 through 15.  1A  has the value 1 x 16 + 10 x 1 = 26.

 

 

            Example 2.12  The hexadecimal A3C represents the unsigned integer value 10x162+3x16+12=2620

 

            Example 2.13  What  8-bit 2's complement signed integer is represented by A9?

            Solution If the higher order hexdigit (the most significant hex digit) is in the range 8 through F, then the number as expressed in 2's complement is a negative number. A9 is a negative number. This problem can be solved by working in binary as follows. A9 represents the binary string 10101001. The 1's complement of this is 01010110. The 2's complement is obtained by adding 1, resulting in 01010111 which represents 87. Hence the solution is -87. Another approach is to work directly in hexadecimal. Just as the binary 1's complement can be formed by subtracting from 11111111, the hexadecimal version of the 1's complement can  be formed by subtracting from FF. FF-A9 = 56, since F-A=5 and F-9=6. The 2's complement can then be formed by adding 1 resulting in 57 which is the same as binary 01010111.

 

To avoid any ambiguity, a suffix b,d or h will be used to denote binary, decimal or hexadecimal notation. For example, 10b=2d but 10h=16d.

 

2.4 BCD REPRESENTATION

 

Binary coded decimal digits are useful in a number of applications. The BCD coding system associates a 4-bit binary code with each of the ten decimal digits, 0000 for 0, 0001 for 1, ..., 1001 for 9. A byte then represents a decimal number in the range 0...99.

 

Example 2.14 (a) What 4-bit binary strings are not BCD codes? (b) What number is represented by BCD code 01100101

Solution: (a) 1010, 1011,1100, 1101, 1110, 1111. (b) 01010 represents 6 and 0101 represents 5.  Hence  01100101 represents 65 .

 

2.5 REAL NUMBER REPRESENTATION

 

Real numbers can be "fixed point" or "floating point." Fixed point numbers are used extensively in business applications, especially with two digits to the right of decimal point.  Representation of fixed point numbers is very easy to implement. For example, if a program is designed to utilize two places after the decimal point, then 100 times any fixed point number is an integer, so these values can be implemented as integer representations. For example, 13.57 can be stored internally as 1357.

            Scientific applications require the use of floating point values (with varying numbers of digits after the decimal point.) Before looking at floating point representations, it is necessary to consider binary fractions.

            Binary fractions are written as positional notation with positions to the right of the decimal point corresponding to the negative powers of 2. For example, 110.101=1x22+1x21+0x20+1x2-1+0x2-2+1x2-3=5.625. The actual representation can be in two bytes. The most significant byte storing the integral (value before the decimal point) portion and the least significant byte storing the fraction (value after the decimal point). So the number 5.625 can be stored as 0000011000000101. This scheme allocates a fixed amount of storage for the fractional and integral parts severely restricting the range and precision of a real number.

            Floating point values are typically represented in exponential form with one integer representing the mantissa and another integer representing the exponent. There are various ways of representing the mantissa and exponent We present one such schema, the IEEE/ANSI (footnote) standard format. The scheme is somewhat complex, so we present by an example.

 

            Example 2.15 What is the floating point representation in ANSI format of a decimal number 1.6875x2-50

            Solution The ANSI format of a floating point number is in 32 bits. In the given number, 1.6875 is referred to as the mantissa. The exponent is -50 and the base is 2.

The mantissa is stored in sign-magnitude form with the sign bit stored in bit 31 and the mantissa stored from bit 22 through bit 0. The base of the number is implied and hence not stored. The exponent is stored as biased 127 i.e., value of the exponent + 127. The exponent  is stored in bits 30 through 23.

            The mantissa is expressed so that the only significant bit before the decimal point is a 1. Also, the mantissa is stored so that the most significant bit after the decimal point is always 1. This is referred to as a normalized mantissa. The decimal point and the most significant bit is assumed and so never stored. i.e., 1.0 is assumed to precede the actual stored mantissa.

            In the above example mantissa of 1.6875d = 1.1011b. Hence the mantissa is stored with the sign bit of 1(bit 31) and the magnitude as 10110000000000000000000 (bits 22 through bit 0).

                        The exponent is stored as -50+127=77d=01001101b.

Hence the ANSI format representation of 1.6875x2-50 in 32 bits is:

00100110110110000000000000000000

 

             Remarks

(1) Normalization: Floating point numbers are stored with a normalized mantissa. Hence 1.0 is assumed to precede before the actual stored mantissa. i.e., the mantissa is always adjusted so that it begins with 1.0 followed by the rest of the bits. This can be easily achieved by shifting left the actual mantissa. The exponent has to be adjusted so that the value of the floating point number remains unchanged. This is simplified by choosing the base of 2 so that for every left shift of the mantissa the exponent has to be decremented by 1.

(2) Range of floating point numbers: ANSI/IEEE format reserves the exponents 0 and 255 for special conditions. Thus the actual exponent range is between 127 to -126. The smallest non zero floating point number is ‑1.0×2‑126 i.e., sign bit is a 1, mantissa is all 0's and the exponent is stored as a 1. The largest floating point number is (2-2-232127 i.e., sign bit is 0, mantissa is stored as all 1's and the exponent is stored as 254. These values give an approximate decimal range of 10+38 to 10-38.

(3) Single precision vs. double precision: The 32 bit representation of a floating point number is referred to as single precision. Double precision is a 64 bit representation which is a straightforward extension of single precision standard. Double precision allows representation of a 52 bit mantissa, a sign bit and a 11 bit exponent. The range of numbers and the precision of a floating point number is significantly better in double precision standard.

 

2.6 CHARACTER REPRESENTATIONS.

 

Information may be presented in the form of written words (in English or another natural language), numbers, or pictures. A written word in a natural language is a string of characters. In order for computers to process written words, a binary coding systems for characters is needed. The most common binary coding system for characters is 8-bit ASCII (American Standard Code for Information Interchange). The IBM personal computers use this code for communication with their video monitors. The MSB of an 8 bit ASCII code is always 0, so sometimes only 7 data bits are used when data is transmitted between computers. These 7 bits provide 128 characters codes. Of these 62 are needed for numerals and upper and lower case letters; an addition 32 are needed for punctuation and other computer keyboard/display symbols. The remaining 34 codes are used for ctrl characters.

 Two codes of particular interest are:

            (1) 0Ah. This is the ASCII code for linefeed. It is produced by pressing ctrl-J at the keyboard. On output to the video display, it causes the cursor to move down one line.

            (2) 0Dh. The ASCII code for carriage return. It is generated when the Enter key is pressed or when ctrl-M is pressed. When used for video display character output it causes the curor to return to the lefthand side of the screen. Carriage return and linefeed are often used together to start a new line of text on the video display.

            IBM personal computers use codes with MSB =  1 for graphic figures. This extended ASCII system is presented in  http://www.asciitable.com          

            EBCDIC (Extended Binary Coded Decimal InterChange) is another character coding system. It is used by IBM mainframe computers.

 

Example 2.16 Represent the string "You!" in hexadecimal.

Solution  59 6F 75 21