[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Google Summer of Code, Debian Science



Hi Anton, I suggest this wording for an MPI Performance Testing project

cc: Lucas: I'm aiming to set up MPI testing protocols that could be run on g5k to validate the parallel performance of Debian packages



Project: Debian Cloud Computing (MPI) Performance Testing
-------

Aim: to characterize and monitor the performance of debian cloud computing (MPI) packages
---

Objective: to establish Standard Operating Procedures for monitoring performance of debian MPI packages
---------
1) protocol for managing test launching on cloud computing installations
 2) protocol for managing test results (e.g. database)
 3) protocol for reporting results

Background
----------

Complex (large-scale) scientific computation is facilitated by parallelization of numerical libraries and end-use packages, typically via MPI (Message Passing Interface). The numerical library stack for applications is usually complex. For instance FEniCS is a library providing automated solutions to differential equations using Finite Element Methods. The numerical stack for FEniCS (python3-dolfinx) could be summarized as

python3-dolfinx
       |
     numpy - libdolfinx-dev - petc4py - PETSc
                                          |
        hypre - scotch - superlu-dist - mumps
            |
       scalapack - libhdf5-openmpi - mpi4py - openmpi
                                                |
                      BLAS - xsimd - lapack - basix


An upgrade of any package, or in the configuration of any package, at any point along the chain has the potential to greatly impact the parallel performance of the end-user application, both positively or negatively depending on the change. We want to be certain that the integrity of the parallel performance of Debian packages is maintained. This complements the existing CI testing reported at https://ci.debian.net/. CI testing helps ensure packages continue to run correctly. With Performance Testing, we want also to be confident that they continue to scale satisfactorily in the HPC or cloud computing sense.

Complementary to monitoring MPI package performance over time, we also want to be able to report comparative performance in the case where alternative packages are available. This point is particularly relevant to the Debian BLAS packages. BLAS clients are expected to build against generic BLAS (libblas3) but run against optimized BLAS implementations (e.g. openblas, blis, atlas). In the case of openblas, serial, pthread and openmpi threading variants are available. How does the performance of the end-user application compare using these different BLAS implementations?

Task
----

The FEniCS project provides a package for performance testing (fenicsx-performance-tests). This will be used as the starting point for the project.

1) What runtime parameters should fenicsx-performance-tests be launched with in order to provide sufficiently meaningful scaling tests? 2) How should the launch of tests be managed? (should we use https://reframe-hpc.readthedocs.io/ ?) 2b)How should test machines be managed (e.g. how to configure g5k machines to run tests)? How to manage BLAS comparisons? 3) How should the record of test results be managed? (what kind of database or flatfile?) 4) How should test results be presented? (FEniCS uses plotly to generate https://fenics.github.io/performance-test-results/. Should we follow this or use different presentation?) 5) How should we integrate test result pages with Debian websites? (e.g. link from https://ci.debian.net/ or manage elsewhere?) 6) Can we apply these procedures to benchmark other packages, e.g. nwchem, lammps ?


GSOC Candidate
--------------

The candidate is expected to be competent in shell scripting, general coding (e.g. python, C), managing installation of debian packages and building software.
MPI and HPC experience is not mandatory but would be an advantage.


Drew




On 2022-02-22 13:06, Anton Gladky wrote:
Hello Drew,

It is a very good idea!

Though I would separate this task from QA-work on Debian Science
packages. If you want, we could apply one-more project (something
like HPC-testing of MPI-based science packages) and point special
requirements for possible applicants. Feel free to propose a text
for that. Thanks again!

Regards

Anton

Am Di., 22. Feb. 2022 um 12:52 Uhr schrieb Drew Parsons <dparsons@debian.org>:

On 2022-02-21 17:42, Anton Gladky wrote:
> Dear all,
>
> Google Summer of Code call for Debian is announced [1].
> I am going to apply Debian Science Team as one of the projects.
>
> Main topic is QA-Work: Autopkgtests for high-popcon packages,
> gitlab-CI for most of packages, bringing not-in-testing packages
> into the proper shape to let them migrate to testing.
>
> If somebody wants to be a co-mentor or if you have better ideas
> for the project, please let me know.
>
> [1]
> https://lists.debian.org/debian-devel-announce/2022/02/msg00002.html


It would be helpful to run parallel/HPC performance testing for our MPI
numerical packages.

This would be a type of CI testing that we would set up to run regularly
and report.
Lucas Nussbaum is in charge of an academic supercomputing cluster that
we can access to run such tests.

Some packages have benchmarks already at hand.  The FEniCS project for
instance offers fenicsx-performance-tests (both prebuilt and source).

The project would determined how to set up MPI CI testing (how to
activate it on Lucas' system), and what parameters (test size etc) to
use to get meaningful numbers.
A suggested tools for managing test parameters and results might be
https://reframe-hpc.readthedocs.io/en/stable/

The report format could be similar to
https://fenics.github.io/performance-test-results/
or perhaps the GSoC worker could come up with a better way of presenting
results.

It would be useful to be able to quantify how well our HPC packages
actually scale (in cloud computing environments) and monitor if there's
any drop in performance (e.g. with version updates)

Also useful to report their performance with the various BLAS
alternatives.

This would be valuable GSoC project I think.

Drew


Reply to: