Re: Similar systems, different performance of fortran code
More information about this.
Now that I know that the problem seems to be in g77-3.3, I make time
profiling line by line, comparing the output of g77-2.95 and g77-3.3 in the
sid ("faster") machine.
I'm compiling with -g option only, and profile with 'qprof -g line'.
The 2.95-version profile, after sorting with '-n -k2 -r', shows this at the
top:
libm.so.6(sqrt) 163 ( 11%)
libc.so.6(__write) 95 ( 6%)
move_:em1ori.f:1429 57 ( 4%)
move_:em1ori.f:1450 53 ( 3%)
move_:em1ori.f:1443 53 ( 3%)
move_:em1ori.f:1413 53 ( 3%)
This is the 'normal' behavior.
Then, the 3.3-version profile, shows this:
move_:em1ori.f:1417 1005 ( 9%)
move_:em1ori.f:1416 936 ( 8%)
move_:em1ori.f:1415 930 ( 8%)
move_:em1ori.f:1439 833 ( 7%)
move_:em1ori.f:1443 812 ( 7%)
move_:em1ori.f:1433 767 ( 7%)
move_:em1ori.f:1414 728 ( 7%)
etc. In total, 18 lines with numbers above 100 (above 200 in fact), all
of them part of the 'move' subroutine.
However, the lines in question are not strange. The worst line, l.1417, is:
1417: abzpt=abz(ij+1)+del*(abz(ij+2)-abz(ij+1))+ab0z
All "heavy" lines are of the same kind:
1417: abzpt=abz(ij+1)+del*(abz(ij+2)-abz(ij+1))+ab0z
1416: abypt=aby(ij+1)+del*(aby(ij+2)-aby(ij+1))
1415: aezpt=aez(ij+1)+del*(aez(ij+2)-aez(ij+1))
1439: f=2.d0/(1.d0+abxpt*abxpt+abypt*abypt+abzpt*abzpt)
1443: gvxs=gvxs+vvy*abzpt-vvz*abypt+aexpt
1433: vvx=gvxs+gvys*abzpt-gvzs*abypt
1414: aeypt=aey(ij+1)+del*(aey(ij+2)-aey(ij+1))
etc.
These lines cover almost completely the 'move' subroutine, at least the part
inside the loop which moves the particles. I would be happy if it were that
simple, but it's not, since the last part of the loop, which contains lines
such as:
1459: jym(ij+1)=jym(ij+1)+dells*qdtdn*vy(j)
[...]
1470: delrs=x(j)-ij
[...]
1473: rho(ij+2)=rho(ij+2)+delrs*qdxdn
etc., do not get as high counts rest:
move_:em1ori.f:1459 5 ( 0%)
move_:em1ori.f:1470 25 ( 0%)
move_:em1ori.f:1473 11 ( 0%)
However, these lines of code are executed as many times as the previous ones.
The only correlation I can see is that the first part of the code has mainly
lines which assign values to individual variables, whereas the second part,
with less counts in the profile, are mainly lines assigning values to matrix
elements.
Does this make any sense to anyone? Could the compiler be doing something
strange according to the type of variable the assignment is made to?
Regards,
Victor
Reply to: