Floating point operations are hard to implement on FPGAs because of complexity of their algorithm. On the other hand many scientific problems require floating point arithmetic with high level of accuracy in their calculations. Therefore VHDL programming for IEEE single precision floating point adder in both the concurrent and sequential processing module have been explored. For the processing module various parameters i.e. speed, clock period, chip area (no. of slices used), modeling format, combinational delay, total number of destination paths will be compared. Comparing various parameters will provide with the information that which type of processing in a floating point adder will take more clock period and less chip area for same given input. Keywords-component; Floating point arithmetic, FPGAs. I-Introduction As demand rises for electronic devices to be smaller, faster and more efficient, increasing importance is placed on well designed pipelined architecture. Pipelined architecture that uses concurrent processing tends to use faster clock period, less combinational delay and hence faster speed but also consumes more chip area compared with standard architecture(without pipeline) that uses sequential processing. Standard algorithm is best implementation with respect to area but has a large overall latency but uses less number of slices [1]. Pipelined architecture tends to reduce latency at a cost of increase in area compared to standard architecture. The 2 path implementation shows a 10% reduction in latency with an added expense of 88% in area compared to standard algorithm. The 5 stage pipelined implementation shows a 6.4% improvement in clock speed. It shows pipelining is directly proportional to chip area. As stages of pipelining increases there is increase in throughput but with an adverse effect of increase in chip area. Floating point adders are used in number of applications such as computer graphics, robotics, digital computers, and DSP processors. Floating point adders can be used to perform both the function of addition and subtraction of two floating point numbers. If the floating point operation is to be performed on decimal number, then the number needs to be converted to floating point format before any operation to be performed[1][2]. In this paper various implementation of floating point adder are studied for better utilization of chip area and to reduce combinational delay for better throughput II-FLOATING POINT FORMAT Floating point number is composed of three fields and can be 16, 18, 32 and 64 bit format. 32 31 23 0 Figure 1: Floating number format Figure 1 shows the 32 bit format of IEEE standard for floating point arithmetic. 1 bit sign: A value of ‘1’ indicates a number is negative and ‘0’ indicates a positive number. Bias-127 exponent, e= E + bias: This gives us an exponent range from E (min) = -126 to E (max) = 127. Mantissa: The fractional part of a number. The fractional part must not be confused with the significant which is 1 plus the fractional part. The leading 1 in the significant is implicit. When performing operation with this format, the implicit bit is usually made explicit. Sign Exponent Mantissa IJECSE,Volume1,Number 2 Karan Gumber and Sharmelee Thangjam ISSN-2277-1956/V1N2-497-501 III-FLOATING POINT ADDER AND ADDITION ALGORITHM The hardware implementation of floating point adder is outlined in figure 2 and it basically reflects the algorithm presented below in this section. Two points worth noting are hidden bit extraction and reassembly of the result into 32 bit format. First he implicit bit of each of the operands must be made explicit. Although most of the time this will be a’1’, the possibility of the number being zero must not be reflected. When the biased exponent and the fractional field are ‘0’, then the number represented is zero. Thus, in order to extract the correct bit 2 8-input OR gates are used. If all the bits of exponent are ‘0’, then this number is zero and its 24 bit will be zero. Otherwise a ‘1’ is inserted. Once the result of addition is obtained, it must be converted into 32 bit format. The normalization unit shifts the result left until the most significant bit is set to ‘1’ is in position 24. After the number is normalized, the 24 bit is replaced by sign of the result. Also the exponent selected as the result exponent must be adjusted to reflect the shifting that took place; this shifting amount is added to e1 to get the correct exponent. At this point all 32 bits of result are available so they can pass to next operator or stored in memory. Figure 2: Floating point adder circuit [2]. Addition Algorithm: In this section, we explain the algorithm we use for floating point addition. Given two numbers N1 and N2, we can use the flowchart in figure 3 to compute their sum, given that e1,e2 and s1,s2 are the exponents and significands af the numbers, respectively. Adetailed description of the algorithm follows: 1. Make the 24 bit (hidden bit) explicit. If exponent = ‘0’. At this point 33 bits are needed to store the number 8 for exponent, 24 for the significand and 1for the sign. 2. Compare e1 and e2. If e2>e1, swap N1 and N2. Note if swap takes place, further refrences in the flowchart ti s1(e1) will be referring to the old s2(e2) and vice versa. Also, the absolute difference in the exponent values(e2-e1) needs to be saved 3. Shift s2 to the right by an amount equal to d = (e2-e1). Fill the leftmost bits with zeros. Note that both the numbers are in simple sign/magnitude format. 4. If N1 and N2 have different signs, replace s2 by its two’s complement. 5. Compute the significand, S, of the result by adding s1 and s2. 6. If S is negative, replace by its two’s complement. For S to be negative following conditions are true. a. N1 & N2 have different signs. 499 Comparative analysis of Sequential and Concurrent processing for Floating point adder on reconfigurable hardware ISSN-2277-1956/V1N2-497-501 b. The most significant bit of ‘S’ is ‘1’. c. There was no carry out in step 5 7. Normalization step. a. If N1 & N2 have same sign and there was carry out in step 5, then shift S right by one, dropping the last significant bit and filling most significant bit with a’1’. b. Otherwise, shift S left until there is a’1’ in the MSB bit. c. If S was shifted left more than 24 times, the result is zero. 8. The sign of result is determined by simply making output sign the sane as the sign of larger N1 and N2. The MSB of sign bit is replaced with this sign bit. 9. The resultant exponent (e1) is adjusted by adding the amount determined in step 7. If it was determined in step 7(c) that S=0, set the exponent to zero. 10. Assemble the result into 32 bit format [2][3][4]. IV-IMPLEMENTATION OF FLOATING POINT ADDER Floating point adder can be implemented using various processing i.e. sequential and concurrent processing. A-Implementation of floating point adder using concurrent processing Implementation of floating point adder using concurrent processing involves various stages of pipelining in design. In concurrent processing, more than one task run in parallel. By increasing number of stages there is increase in speed but considerably there is increase in chip area. Pipelining is used in order to decrease the clock period, run operation at a higher clock rate, and boost speed up by increasing the throughput. Pipelining is achieved by distributing hardware into smaller operation such as overall operation takes more clock cycles to compete but permits new input to be added with increase clock cycle to increase throughput. Figure 3 shows the implementation of floating point adder using concurrent processing [1][5][6]. Figure 3: Algorithm for floating point addition using concurrent processing [1]. B-Implementation of floating point adder using sequential processing IJECSE,Volume1,Number 2 Karan Gumber and Sharmelee Thangjam ISSN-2277-1956/V1N2-497-501 In sequential processing the next task starts after the completion of previous task i.e. output of one unit becomes the input of next unit. Implementation of floating point adder using sequential processing involves only a single stage of pipelining it implies that none of the two tasks run concurrently. During the sequential processing only one task run at a time hence there is less utilization of chip area but output takes considerably more time to appear which means more combinational delay. In order to minimize the small area requirement, small floating point representation of 16 and 18 bits are used instead of 32 bit IEEE format. Figure 4 shows the implementation of floating point adder using sequential processing [2][5][7]. Figure 4 shows that output of one block goes to input of next block. None of two blocks run in parallel. Figure 4: Algorithm for floating point addition using sequential processing [2]. 501 Comparative analysis of Sequential and Concurrent processing for Floating point adder on reconfigurable hardware ISSN-2277-1956/V1N2-497-501 V-COMPARATIVE STUDY[1][2][8]. PARAMETERS SEQUENTIAL PROCESSING CONCURRENT PROCESSING