Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Ken Kennedy
Catalytic Compilers
1900 Embarcadero Rd, #206
Palo Alto, CA. 94043
jra@catacomp.com
Foundation
Although this paper is entitled Automatic Loop Interchange, it
is far broader in scope. As an introduction to interchange, the
paper also covers a wide spectrum of dependence-based theory
and transformations. This work is built on the efforts of many
others, and we would be remiss if we did not acknowledge at
least some of those efforts acknowledging all of them would
quickly blow our page limits. The earliest papers on
dependence-based program transformations include papers by
Lamport [10,11] and Kuck [9]. Lamport developed a form of
loop interchange for use in vectorization, as well as the
wavefront method for parallelization, an early form of what
came to be called loop skewing.
At that time, we had to pay for computer time by the CPUminute. The first time that we tried a large test case (roughly
1000 lines of code), Ken insisted that we limit the CPU time to
10 minutes (which was still several thousand dollars of
computer time) to avoid blowing our research budget. We didnt
expect the test case to complete in the time limit; when it took
only 40 seconds, we assumed that PFC had crashed processing
the input. It took us a day of wading through the output to verify
that it had in fact completely and correctly processed the test.
ACM SIGPLAN
75
Impact
The approaches to dependence and loop interchange presented
in this paper were soon incorporated into a number of
commercial compilers. We are directly aware of the
implementations in the IBM compiler for the 3090 Vector
Feature [13] and the Convex vectorizing compiler, and were
involved in the implementation of the Ardent restructuring
compilers.
Bibliography
1. J.R. Allen. Dependence analysis for subscripted variables
and its application to program transformations. Ph.D
dissertation, Department of Mathematical Sciences, Rice
University, May, 1983.
2. J. R. Allen and K. Kennedy. PFC: a program to convert
Fortran to parallel form. In Supercomputers: Design and
Applications, K. Hwang, editor, pages 186203. IEEE
Computer Society Press, August 1984.
3. J. R. Allen and K. Kennedy. Automatic translation of
Fortran programs to vector form. ACM Transactions on
Programming Languages and Systems, 9(4):491542,
October 1987.
4. R. Allen and K. Kennedy. Optimizing Compilers for
Modern Architectures. Morgan Kaufmann, 2002.
5. R. Allen. Unifying vectorization, parallelization, and
optimization: the Ardent compiler. In Proceedings of the
Third International Conference on Supercomputing, 1988.
6. D. Callahan, S. Carr, and K. Kennedy. Improving register
allocation for subscripted variables. In PLDI 90 (also
included in this volume).
7. D. Callahan, J. Dongarra, and D. Levine. Vectorizing
compilers: A test suite and results. In Proceedings of
Supercomputing 88, Orlando, FL, 1988.
8. K. Kennedy. Automatic translation of Fortran programs to
vector form. Rice Technical Report 476-029-4, Department
of Mathematical Sciences, Rice University, 1980.
9. D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. J. Wolfe.
Dependence graphs and compiler optimizations. In
Conference Record of the Eighth Annual ACM Symposium
on the Principles of Programming Languages,
Williamsburg, VA, January 1981.
10. L. Lamport. The parallel execution of DO loops.
Communications of the ACM, 17(2):8393, February 1974.
11. L. Lamport. The coordinate method for the parallel
execution of iterative DO loops. Technical Report CA7608-0221, SRI, Menlo Park, CA, August 1976, revised
October 1981.
12. D. A. Padua and M. J. Wolfe. Advanced compiler
optimizations for supercomputers. Communications of the
ACM, 29(12):11841201, December 1986.
13. R. G. Scarborough and H. G. Kolsky. A vectorizing
FORTRAN compiler. IBM Journal of Research and
Development, March 1986.
14. M. E. Wolf and M. Lam. A data locality optimizing
algorithm. In PLDI 91 (also included in this volume).
15. M. J. Wolfe. Techniques for improving the inherent
parallelism in programs. Masters thesis, Dept.of Computer
Science, University of Illinois at Urbana-Champaign, July
1978.
16. M. J. Wolfe. Advanced loop interchanging. In Proceedings
of the 1986 International Conference on Parallel
Processing, St. Charles, IL, August 1986.
17. M. J. Wolfe. High Performance Compilers for Parallel
Computing. Addison-Wesley, Redwood City, CA, 1996.
Future Applications
Looking back over the past 18 years, we doubt that we would
have predicted the impact of loop interchange on the compiler
literature. Although our own work and the work of others went
on to more powerful transformation strategies based on direction
and distance matrices [4, 14, 16, 17], this work was one of the
first to establish that powerful and effective program
transformations could be implemented in practical compiler
systems.
Of course, one reason for the growth in importance of this work
is the increased use of parallelism in computer architecture and
the increasing disparity between CPU and memory speeds.
Looking to the future, we believe these factors are only going to
increase in the design of computer systems, making these
compiler techniques even more relevant. Memory hierarchies in
particular are increasingly dominating computation times, and
automatic loop interchange is a key transformation for
exploiting that hierarchy.
While loop interchange has been thoroughly explored in the
context of restructuring compilers, there are other contexts
which have not been so thoroughly explored. For instance, given
the intimate relationship between dependence and loop
iterations, it is natural to assume that dependence and loop
interchange should have as important a role to play in the design
of pipelined architectures as it does in exploiting pipelined
architectures.
Acknowledgements
As was the case at the time the paper was published, this work
has progressed over the years only by the efforts and
collaborations of others far too numerous to list here. However,
we would be remiss if we did not acknowledge the contributions
of Randy Scarborough, Joe Warren, Horace Flatt, and all the
graduate students who worked on PFC.
ACM SIGPLAN
76
ACM SIGPLAN
77
ACM SIGPLAN
78
ACM SIGPLAN
79
ACM SIGPLAN
80
ACM SIGPLAN
81
ACM SIGPLAN
82
ACM SIGPLAN
83
ACM SIGPLAN
84
ACM SIGPLAN
85
ACM SIGPLAN
86
ACM SIGPLAN
87
ACM SIGPLAN
88
ACM SIGPLAN
89
ACM SIGPLAN
90