[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#623453: ITP: xdmf -- eXtensible Data Model and Format



Package: wnpp
Severity: wishlist
Owner: Alastair McKinstry <mckinstry@debian.org>

* Package name    : xdmf
  Version         : 2.1
  Upstream Author : Name <somebody@
* URL             : http://www.xdmf.org/
* License         : BSD
  Programming Lang: C++, Python, Fortran
  Description     : eXtensible Data Model and Format library

The need for a standardized method to exchange scientific data between High Performance Computing codes and tools lead to the development of the eXtensible Data Model and Format (XDMF) . Uses for XDMF range from a standard format used by HPC codes to take advantage of widely used visualization programs like ParaView, to a mechanism for performing coupled calculations using multiple, previously stand alone codes.

Data format refers to the raw data to be manipulated. Information like number type ( float, integer, etc.), precision, location, rank, and dimensions completely describe the any dataset regardless of its size. The description of the data is also separate from the values themselves. We refer to the description of the data as Light data and the values themselves as Heavy data. Light data is small and can be passed between modules easily. Heavy data may be potentially enormous; movement needs to be kept to a minimum. Due to the different nature of heavy and light data, they are stored using separate mechanisms. Light data is stored using XML, Heavy data is typically stored using HDF5. While we could have chosen to store the light data using HDF5 attributes using XML does not require every tool to have access to the compiled HDF5 libraries in order to perform simple operations.

Data model refers to the intended use of the data. For example, a three dimensional array of floating point vales may be the X,Y,Z geometry for a grid or calculated vector values. Without a data model, it is impossible to tell the difference. Since the data model only describes the data, it is purely light data and thus stored using XML. It is targeted at scientific simulation data concentrating on scalars, vectors, and tensors defined on some type of computational grid. Structured and Unstructured grids are described via their topology and geometry. Calculated, time varying data values are described as attributes of the grid. The actual values for the grid geometry, connectivity, and attribute values are contained in the data format. This separation of data format and model allows HPC codes to efficiently produce and store vales in a convenient manner without being encumbered by our data model which may be different from their internal arrangement.


XDMF uses XML to store Light data and to describe the data Model. HDF5 is used to store Heavy data. The data Format is stored redundantly in both XML and HDF5. This allows tools to parse XML to determine the resources that will be required to access the Heavy data. 


 XDMF is used in the visualization tools Paraview (already in Debian) and VisIt (which I am packaging).



Reply to: