[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Similar systems, different performance of fortran code



More information about this.

Now that I know that the problem seems to be in g77-3.3, I make time
profiling line by line, comparing the output of g77-2.95 and g77-3.3 in the
sid ("faster") machine. 

I'm compiling with -g option only, and profile with 'qprof -g line'.

The 2.95-version profile, after sorting with '-n -k2 -r', shows this at the
top:

libm.so.6(sqrt)                                                  163    ( 11%)
libc.so.6(__write)                                               95     (  6%)
move_:em1ori.f:1429                                              57     (  4%)
move_:em1ori.f:1450                                              53     (  3%)
move_:em1ori.f:1443                                              53     (  3%)
move_:em1ori.f:1413                                              53     (  3%)

This is the 'normal' behavior.

Then, the 3.3-version profile, shows this:

move_:em1ori.f:1417                                              1005   (  9%)
move_:em1ori.f:1416                                              936    (  8%)
move_:em1ori.f:1415                                              930    (  8%)
move_:em1ori.f:1439                                              833    (  7%)
move_:em1ori.f:1443                                              812    (  7%)
move_:em1ori.f:1433                                              767    (  7%)
move_:em1ori.f:1414                                              728    (  7%)

etc. In total, 18 lines with numbers above 100 (above 200 in fact), all 
of them part of the 'move' subroutine.

However, the lines in question are not strange. The worst line, l.1417, is:

1417:            abzpt=abz(ij+1)+del*(abz(ij+2)-abz(ij+1))+ab0z

All "heavy" lines are of the same kind:

1417:            abzpt=abz(ij+1)+del*(abz(ij+2)-abz(ij+1))+ab0z
1416:            abypt=aby(ij+1)+del*(aby(ij+2)-aby(ij+1))
1415:            aezpt=aez(ij+1)+del*(aez(ij+2)-aez(ij+1))
1439:            f=2.d0/(1.d0+abxpt*abxpt+abypt*abypt+abzpt*abzpt)
1443:            gvxs=gvxs+vvy*abzpt-vvz*abypt+aexpt
1433:            vvx=gvxs+gvys*abzpt-gvzs*abypt
1414:            aeypt=aey(ij+1)+del*(aey(ij+2)-aey(ij+1))

etc.

These lines cover almost completely the 'move' subroutine, at least the part
inside the loop which moves the particles. I would be happy if it were that
simple, but it's not, since the last part of the loop, which contains lines
such as:

1459:            jym(ij+1)=jym(ij+1)+dells*qdtdn*vy(j)
[...]
1470:            delrs=x(j)-ij
[...]
1473:            rho(ij+2)=rho(ij+2)+delrs*qdxdn

etc., do not get as high counts rest:

move_:em1ori.f:1459                                              5      (  0%)
move_:em1ori.f:1470                                              25     (  0%)
move_:em1ori.f:1473                                              11     (  0%)

However, these lines of code are executed as many times as the previous ones.

The only correlation I can see is that the first part of the code has mainly
lines which assign values to individual variables, whereas the second part,
with less counts in the profile, are mainly lines assigning values to matrix
elements.

Does this make any sense to anyone? Could the compiler be doing something
strange according to the type of variable the assignment is made to?

Regards,

					Victor
					




Reply to: