Personal tools
You are here: Home Research Astrophysical Computing Enzo Optimizing Enzo
Document Actions

Optimizing Enzo

Optimizing Enzo with the Intel compiler

Overview

Below I document my efforts to optimize Enzo only with the Intel v8.1 compiler command line options. There are a few beneficial options that we can utilize past the most aggressive option -O3. I benchmarked the runs with a short and simple simulation with a 64^3 topgrid with DM and 9 species cooling on 4 processors. Since the benchmark was a short run, the absolute time savings aren't impressive, but one must look at the percentage changes since a 15% increase in performance results in a day decrease in a week-long simulation.

Baserun

  • Options: -O2
  • Time: 1 minute 54 seconds

Aggressive Optimization

  • Options: -O3
  • Time: 1 minute 43 seconds (9.6% decrease)

-fast

Enzo will not compile with the default compiler options specified by -fast, which are -O3 -ipo -static. The -static option causes problems with the MPI libraries, which I don't fully understand. However, we don't want to use -ipo for reasons describe next.

Aggressive Optimization plus IPO

The Intel compiler has the capability of optimizing a program as a whole, not as individual routines. They call this Interprocedural Optimization (IPO). The user must also specify -ipo_obj in addition to -ipo in order to use static libraries (i.e. foo.a). In theory, this should increase performance. I observed a significant degradation.

  • Options: -O3 -ipo -ipo_obj
  • Time: 2 minutes 37 seconds (37.7% increase)
  • Note: When using IPO, the linked libraries must be specified in the correct order. If not, IPO complains that there are unknown symbols. In my HDF4 version, the correct order is -lmpi -lmpi++ -limf -lm -lmfhdf -ldf -ljpeg -lz -lstdc++ -lcxa -lunwind -lifcore -lifport.

Aggressive Optimization plus Floating Point Optimizations

We can take -O3 a step farther and optimize the floating point operations. I used two options here. -IPF_fma combines the multiply, addition, and subtraction options. -IPF_fp_speculation_modefast allows the Itanium II processor to "speculate" the next operation.

  • Options: -O3 -IPF_fma -IPF_fp_speculation_modefast
  • Time: 1 minute 40 seconds (12.3% decrease)

Above plus PGO

The final step I took in this brief excursion in optimization is the usage of Intel's profile guided optimization (PGO). This is a 3 step procedure to produce an optimized code.

Step 1

The user compiles the code with the additional option -prof_gen to generate profiles while running the executable.

Step 2

The user runs the executable produced in Step 1 to actually produce the performance profiles. It is important that you run the code on a simulation that is similar to a typical one. It can be smaller and shorter, but it should traverse through the same routines as a typical simulations (i.e. cooling, rebuild hierarchy, communication, etc.)

Step 3

Finally the user compiles the code with -prof_use in place of -prof_gen. The compiler will use the profiles to fully optimize the most heavily travelled parts of the code. You cannot use make -j to compile it. It must be compiled in serial.

  • Options: -O3 -IPF_fma -IPF-fp_speculation_modefast -prof_use
  • Time: 1 minute 30 seconds (21.1% decrease)

Powered by Plone CMS, the Open Source Content Management System