| 1 |
/*****************************/ |
| 2 |
/* FLOPS.c */ |
| 3 |
/* Version 2.0, 18 Dec 1992 */ |
| 4 |
/* Al Aburto */ |
| 5 |
/* aburto@marlin.nosc.mil */ |
| 6 |
/* 'ala' on BIX */ |
| 7 |
/*****************************/ |
| 8 |
|
| 9 |
|
| 10 |
Flops.c is a 'c' program which attempts to estimate your systems |
| 11 |
floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV |
| 12 |
operations based on specific 'instruction mixes' (discussed below). |
| 13 |
The program provides an estimate of PEAK MFLOPS performance by making |
| 14 |
maximal use of register variables with minimal interaction with main |
| 15 |
memory. The execution loops are all small so that they will fit in |
| 16 |
any cache. Flops.c can be used along with Linpack and the Livermore |
| 17 |
kernels (which exersize memory much more extensively) to gain further |
| 18 |
insight into the limits of system performance. The flops.c execution |
| 19 |
modules also include various percent weightings of FDIV's (from 0% to |
| 20 |
25% FDIV's) so that the range of performance can be obtained when |
| 21 |
using FDIV's. FDIV's, being computationally more intensive than |
| 22 |
FADD's or FMUL's, can impact performance considerably on some systems. |
| 23 |
|
| 24 |
Flops.c consists of 8 independent modules (routines) which, except for |
| 25 |
module 2, conduct numerical integration of various functions. Module |
| 26 |
2, estimates the value of pi based upon the Maclaurin series expansion |
| 27 |
of atan(1). MFLOPS ratings are provided for each module, but the |
| 28 |
programs overall results are summerized by the MFLOPS(1), MFLOPS(2), |
| 29 |
MFLOPS(3), and MFLOPS(4) outputs. |
| 30 |
|
| 31 |
The MFLOPS(1) result is identical to the result provided by all |
| 32 |
previous versions of flops.c. It is based only upon the results from |
| 33 |
modules 2 and 3. Two problems surfaced in using MFLOPS(1). First, it |
| 34 |
was difficult to completely 'vectorize' the result due to the |
| 35 |
recurrence of the 's' variable in module 2. This problem is addressed |
| 36 |
in the MFLOPS(2) result which does not use module 2, but maintains |
| 37 |
nearly the same weighting of FDIV's (9.2%) as in MFLOPS(1) (9.6%). |
| 38 |
The second problem with MFLOPS(1) centers around the percentage of |
| 39 |
FDIV's (9.6%) which was viewed as too high for an important class of |
| 40 |
problems. This concern is addressed in the MFLOPS(3) result where NO |
| 41 |
FDIV's are conducted at all. |
| 42 |
|
| 43 |
The number of floating-point instructions per iteration (loop) is |
| 44 |
given below for each module executed: |
| 45 |
|
| 46 |
MODULE FADD FSUB FMUL FDIV TOTAL Comment |
| 47 |
1 7 0 6 1 14 7.1% FDIV's |
| 48 |
2 3 2 1 1 7 difficult to vectorize. |
| 49 |
3 6 2 9 0 17 0.0% FDIV's |
| 50 |
4 7 0 8 0 15 0.0% FDIV's |
| 51 |
5 13 0 15 1 29 3.4% FDIV's |
| 52 |
6 13 0 16 0 29 0.0% FDIV's |
| 53 |
7 3 3 3 3 12 25.0% FDIV's |
| 54 |
8 13 0 17 0 30 0.0% FDIV's |
| 55 |
|
| 56 |
A*2+3 21 12 14 5 52 A=5, MFLOPS(1), Same as |
| 57 |
�@40.4% 23.1% 26.9% 9.6% previous versions of the |
| 58 |
flops.c program. Includes |
| 59 |
only Modules 2 and 3, does |
| 60 |
9.6% FDIV's, and is not |
| 61 |
easily vectorizable. |
| 62 |
|
| 63 |
1+3+4 58 14 66 14 152 A=4, MFLOPS(2), New output |
| 64 |
+5+6+ 38.2% 9.2% 43.4% 9.2% does not include Module 2, |
| 65 |
A*7 but does 9.2% FDIV's. |
| 66 |
|
| 67 |
1+3+4 62 5 74 5 146 A=0, MFLOPS(3), New output |
| 68 |
+5+6+ 42.9% 3.4% 50.7% 3.4% does not include Module 2, |
| 69 |
7+8 but does 3.4% FDIV's. |
| 70 |
|
| 71 |
3+4+6 39 2 50 0 91 A=0, MFLOPS(4), New output |
| 72 |
+8 42.9% 2.2% 54.9% 0.0% does not include Module 2, |
| 73 |
and does NO FDIV's. |
| 74 |
|
| 75 |
NOTE: Various timer routines are included as indicated below. The |
| 76 |
timer routines, with some comments, are attached at the end |
| 77 |
of the main program. |
| 78 |
|
| 79 |
NOTE: Please do not remove any of the printouts. |
| 80 |
|
| 81 |
EXAMPLE COMPILATION: |
| 82 |
UNIX based systems |
| 83 |
cc -DUNIX -O flops20.c -o flops |
| 84 |
cc -DUNIX -DROPT flops20.c -o flops |
| 85 |
cc -DUNIX -fast -O4 flops20.c -o flops |
| 86 |
. |
| 87 |
. |
| 88 |
. |
| 89 |
etc. |
| 90 |
|
| 91 |
Al Aburto |
| 92 |
aburto@marlin.nosc.mil |
| 93 |
|