Howto
Timing using the build-in facilities can be done with
> /usr/bin/time -v
Additionally, the DAMASK process can be attached to core X (X from 0 to N-1) with
> taskset -c X
Hence, to run the example on the second core, type the following commands
> export DAMASK_NUM_THREADS=1
> /usr/bin/time -v taskset -c 1 DAMASK_spectral -l tensionX.load -g 20grains16x16x16.geom
Intel VTune Amplifier
The Intel VTune Amplifier is part of the Intel Parallel Studio XE.
To run the GUI, type
> amplxe-gui
|
| |
| Figure 1: Screenshot of the VTune Amplifier GUI | |
Spectral Solver Global runtime of examples
Using the example files
20grainsYxYxY.geom and
tensionX.load.
Revision 2220
The runtime of DAMASK interfaced to the basic spectral solver is given for the optimization levels DEFENSIVE and AGGRESSIVE as defined in the
Makefile and various threads set by $DAMASK_NUM_THREADS.
msuws2 |
CPU |
2x6 cores Intel Xeon |
Clock speed |
2.93 GHz |
RAM |
24 GB |
OS |
openSuSE 10.3 |
Compiler |
ifort 12.04 |
|
| |
| Figure 2: Spectral solver basic fix-point computation time for a cube with 16 and 32 Fourier points in each direction. Results are averaged over 10 runs for each result | |
Revision 2218
The runtime of DAMASK interfaced to the basic spectral solver is given for the optimization levels DEFENSIVE and AGGRESSIVE as defined in the
Makefile and various threads set by $DAMASK_NUM_THREADS.
maws01 |
CPU |
2x8(16) cores Intel Xeon |
Clock speed |
3.10 GHz |
RAM |
256 GB |
OS |
Ubuntu 12.04 |
Compiler |
ifort 12.1.2 |
|
| |
| Figure 3: Spectral solver basic fix-point computation time for a cube with 16 and 32 Fourier points in each direction. Results are averaged over 10 runs for each result | |
Revision 2012
The runtime of DAMASK interfaced to the basic spectral solver is given for the optimization levels OFF, DEFENSIVE, and AGGRESSIVE as defined in the
Makefile and various threads set by $DAMASK_NUM_THREADS.
msuws12 |
CPU |
4x12 cores AMD Opteron |
Clock speed |
3.4 GHz |
RAM |
512 GB |
OS |
openSuSE 11.4 |
Compiler |
ifort 12.0.4 |
|
| |
| Figure 4: Spectral solver basic fix-point computation time for a cube with 16, 32, 64 Fourier points in each direction. Results are averaged over two runs for each result | |
Spectral Solver TAU Analysis
TAU plain installation
install to a directory referred as $TAUDIR
MISSING: INSTALLATION OPTIONS
set the following parameters
variable |
setting |
meaning |
$TAU_OPTIONS |
'-optShared -optPreProcess -optCompInst' |
Use shared library version; Preprocess the source code before parsing; Use compiler-based instrumentation |
$TAU_MAKEFILE |
$TAUDIR/x86_64/lib/Makefile.tau |
Setting path of makefile of TAU |
$LD_LIBRARYPATH |
$TAUDIR/x86_64/lib |
Setting TAU's library path |
$TAU_PROFILE |
1 |
Enables profiling |
add to your Makefile
- LIBRARIES +=-lz -ldl
- LIB_DIRS +=-L$TAUDIR/lib/shared
- @rm -rf *.pp.f90 (at the "clean" target)
Compile the code with
> make COMPILERNAME=$TAUDIR/x86_64/bin/tau_f90.sh F90=gfortran SUFFIX=-D__GFORTRAN__
Associated TAU's files created after running DAMASK with those certain parameters above:
tautrace.[node].[context].[thread].trc |
the trace file for each processor or node |
profile.[node].[context].[thread] |
the profile file |
events.[node].edf |
event file for each processor |
To view the trace with JUMPSHOT, do the following:
tau_treemerge.pl; tau2slog2 tau.trc tau.edf -o app.slog2; jumpshot app.slog2
Revision 2319
mupc90x |
CPU |
4 cores Intel Core i5-2500 |
Clock speed |
3.30 GHz |
Cache size |
6144 KB |
Memory size |
6071492 kB |
OS |
Ubuntu 12.04 |
Compiler |
gfortran 4.6.4 |
TAU_version |
2.22-p1 |
DAMASK was compiled with _OPENMP_=OFF, remaining switches are left to their default value.
The runtime of DAMASK interfaced to the basic spectral solver using and the tension load case
tensionX.load for
20grains16x16x16.geom and
20grains32x32x32.geom
20grains16x16x16.geom
function |
inclusive/s |
exclusive/s |
exclusive/% |
number of calls |
main |
1912.751 |
0 |
0.0% |
1 |
damask_spectral_driver |
1912.751 |
4.822 |
0.3% |
1 |
damask_spectral_solverbasic_MOD_basic_solution |
1904.531 |
0.717 |
0.0% |
100 |
damask_spectral_utilities_MOD_constitutiveresponse |
1901.821 |
1.413 |
0.1% |
1061 |
__cpfem_MOD_cpfem_general |
1900.401 |
0.264 |
0.0% |
2122 |
__homogenization_MOD_materialpoint_stressanditstangent |
1863.328 |
11.01 |
0.6% |
1061 |
__constitutive_phenopowerlaw_MOD_constitutive_phenopowerlaw_lpanditstangent |
604.523 |
604.304 |
31.6% |
101638887 |
__crystallite_MOD_crystallite_integratestress |
1250.026 |
544.244 |
28.5% |
47745668 |
__crystallite_MOD_crystallite_integratestatefpi |
1810.475 |
141.441 |
7.4% |
10620 |
__constitutive_MOD_constitutive_lpanditstangent |
691.06 |
86.537 |
4.5% |
101638887 |
__constitutive_MOD_constitutive_collectdotstate |
417.874 |
66.468 |
3.5% |
91245188 |
__constitutive_MOD_constitutive_tanditstangent |
13.515 |
7.523 |
0.4% |
4505114 |
20grains32x32x32.geom
function |
inclusive/s |
exclusive/s |
exclusive/% |
number of calls |
main |
14355.854 |
0 |
0.0% |
1 |
damask_spectral_driver |
14355.854 |
1.544 |
0.0% |
1 |
damask_spectral_solverbasic_MOD_basic_solution |
14338.573 |
5.844 |
0.0% |
100 |
damask_spectral_utilities_MOD_constitutiveresponse |
14316.428 |
15.064 |
0.1% |
1178 |
__cpfem_MOD_cpfem_general |
14301.357 |
1.599 |
0.0% |
2356 |
__homogenization_MOD_materialpoint_stressanditstangent |
14052.225 |
94.9 |
0.7% |
1178 |
__crystallite_MOD_crystallite_integratestress |
1.535 |
0.625 |
0.0% |
153472 |
__crystallite_MOD_crystallite_integratestatefpi |
13606.082 |
13603.998 |
94.8% |
11790 |
__crystallite_MOD_crystallite_stressanditstangent |
13904.33 |
298.236 |
2.1% |
1179 |
__homogenization_MOD_materialpoint_postresult |
247.529 |
246.843 |
1.7% |
1178 |
__crystallite_MOD_crystallite_orientations |
59.285 |
59.089 |
0.4% |
1179 |
__damask_spectral_utilities_MOD_constitutiveresponse |
14316.428 |
15.064 |
0.1% |
1178 |
__constitutive_MOD_constitutive_lpanditstangent |
0.587 |
0.029 |
0.0% |
100001 |
- VTune Amplifier Screenshot: