[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1042036: rocblas: FTBFS: AttributeError: 'KernelWriterAssembly' object has no attribute 'language'



Thanks Lucas,

On 2023-07-25 14:56, Lucas Nussbaum wrote:
# Writing Kernels...
Generating kernels: Launching 8 threads...
Traceback (most recent call last):
   File "/<<PKGBUILDDIR>>/tensile/Tensile/Parallel.py", line 54, in apply_print_exception
     return func(*args)
            ^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/TensileCreateLibrary.py", line 67, in processKernelSource
     header = kernelWriter.getHeaderFileString(kernel)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/KernelWriter.py", line 5065, in getHeaderFileString
     if self.language == "HIP" or self.language == "OCL":
        ^^^^^^^^^^^^^
AttributeError: 'KernelWriterAssembly' object has no attribute 'language'
Custom kernel filename /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/library/src/build_tmp/TENSILE/assembly/DGEMM_Aldebaran_NN_MT128x128x16_MI16x16x4x1_GRVW2_SU4_SUS128_WGM4.s
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
   File "/usr/lib/python3.11/multiprocessing/pool.py", line 125, in worker
     result = (True, func(*args, **kwds))
                     ^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar
     return list(itertools.starmap(args[0], args[1]))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/Parallel.py", line 54, in apply_print_exception
     return func(*args)
            ^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/TensileCreateLibrary.py", line 67, in processKernelSource
     header = kernelWriter.getHeaderFileString(kernel)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/KernelWriter.py", line 5065, in getHeaderFileString
     if self.language == "HIP" or self.language == "OCL":
        ^^^^^^^^^^^^^
AttributeError: 'KernelWriterAssembly' object has no attribute 'language'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
   File "/<<PKGBUILDDIR>>/tensile/Tensile/bin/TensileCreateLibrary", line 43, in <module>
     TensileCreateLibrary()
   File "/<<PKGBUILDDIR>>/tensile/Tensile/TensileCreateLibrary.py", line 1303, in TensileCreateLibrary
     codeObjectFiles = writeSolutionsAndKernels(outputPath, CxxCompiler, None, solutions,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/TensileCreateLibrary.py", line 482, in writeSolutionsAndKernels
     results = Common.ParallelMap(processKernelSource, kIter, "Generating kernels", method=lambda x: x.starmap, maxTasksPerChild=1)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/<<PKGBUILDDIR>>/tensile/Tensile/Parallel.py", line 134, in ParallelMap
     rv = mapFunc(function, objects)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/multiprocessing/pool.py", line 375, in starmap
     return self._map_async(func, iterable, starmapstar, chunksize).get()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
     raise self._value
AttributeError: 'KernelWriterAssembly' object has no attribute 'language'
make[3]: *** [library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/build.make:92: Tensile/library/TensileLibrary.dat] Error 1

This build failure is non-deterministic. I've seen it before, but I had thought it only occurred when specifying the AMDGPU_TARGETS property in the rocBLAS build. It seems it can occur even without that. It may just be that specifying a reduced set of AMDGPU_TARGETS merely increases the probability of failure.

The missing language attribute is an indication that the KernelWriterAssembly object was not initialized before it was used. I have never seen this when building the upstream project, so I suspect that this is related to the removal of the replacement kernels that had were excluded on DFSG grounds during Debian packaging. I am suspicious that this build failure is just one symptom and that the test failures that we see on gfx900 and gfx906 architectures may also be caused by incorrectly generated assembly related to the replacement kernels.

We could run a test build with the replacement kernels restored to verify if this is the case. Even if the replacement kernels cannot be packaged in Debian, a local build with them restored may help us to confirm or falsify my theory as to the cause of this failure.

We can also take a look at the YAML specification that drives the generation of DGEMM_Aldebaran_NN_MT128x128x16_MI16x16x4x1_GRVW2_SU4_SUS128_WGM4.s. A scorched-earth approach to dealing with this issue would be to delete the YAML of problematic assembly kernels until the rocBLAS build and tests stop failing. That may have a serious adverse effect on performance, but it could restore correctness as the library would fall back to using source kernels. We should avoid doing that if possible, but it's an option available if we can find no other solution.

Sincerely,
Cory Bloor


Reply to: