Re: Distributed visualization anyone?
"Eray Ozkural (exa)" wrote:
> Adam C Powell IV wrote:
> > "Eray Ozkural (exa)" wrote:
> > > About unstructured grids: I think you can't do that with PetSC because
> > > PetSC has no idea of sparsity.
> > Hmm, I'm not quite sure what you mean by "sparsity". PETSc has some very nice
> > distributed sparse matrix solvers, though it really doesn't have an object which
> > is dynamically resizable on each node, which would be helpful for an
> > unstructured grid. This is one reason I haven't done the communications part
> > yet, that is, sending all of the triangle data to node 0.
> Well, what I meant by sparsity is that you would need a distributed
> data structure for computing the workload and communication volume required
> by the distribution of the irregular grid on the processors. ( I know it
> sounds kind of twisted )
> That is, the distributed data structure (which would be a graph or a
> hypergraph) has sufficient information to compute what a given data
> distribution (and replication) would result in.
> The objectives are naturally load balancing and minimization of communication
> volume. Minimization of total comms. volume is a more important objective
> than load balancing in this (it seems)
I see. My impression is that (Par)METIS uses number of nodes and connections in the
graph (something like number of off-diagonal matrix block entries in PETSc) to
estimate load and communication volume respectively. These seem to me like decent
approximations, and do not seem too hard to calculate in a distributed way...
But this is quickly moving beyond my knowledge of this field.
> > On the other hand, one can create the triangle data on each node, then create a
> > vector with the different sizes on each node to handle the data, send everything
> > to node 0, and destroy the vector. It's inelegant, but should work. I think
> > I'll try that. I'll try to put up 0.1 this weekend before implementing this
> > (just got some showstopper automake/PETSc issues settled), then do this for 0.2.
> If you do as much of the shading at the nodes and then send the results
> to the interface node (beo master), this way you should be able to handle
> regular grids.
I actually don't do any shading in the nodes (yet), I just send the triangles and
their colors to Geomview and let that calculate shading on all of the triangles. As
you say (below), this is not as efficient as calculating shading at the nodes, but
seems like a good start. When that much works, then I'll consider pushing more
rendering onto the nodes...
Okay, just released PETScGraphics 0.1 (gotta think of a better name...) at
http://lyre.mit.edu/~powell/petscgraphics.html . Note that for the test provided
("make check"), transferring to Geomview and rendering there take a lot more time
than calculating isoquants on one processor! There are a number of TODO items to cut
the communication and rendering time, but this raises the priority of distributed
So I'll attempt a couple of the TODOs and do distributed triangulation then upload
0.2, in a couple of weeks or so. Then start on the distributed rendering.
> > > A friend of mine did parallel volume
> > > visualization on irregular grids but that requires a communication
> > > volume model which is a hypergraph and a hypergraph partitioner to
> > > reach the desired effect; anything else would be wasting the hardware.
> > You're absolutely right. PETSc has an interface with ParMETIS for this
> > purpose. I don't know much about it or its data structures at this point, but
> > plan to learn soon, and maybe package ParMETIS for Debian, and make at least
> > petsc2.0.29-dev depend on it...
> Almost forgot to say: ParMETIS partitions graphs, not hypergraphs. But if you
> could make a graph model for communication volume then you could use that.
> I've seen some ASCI class applications do such tricks.
Really? In this context then, I'm not sure what the difference is between a graph
and a hypergraph. See above for the communication volume estimation from the
> PetSC's implementation
> of parallel Newton-Krylov solvers are state-of-the-art, so it's only logical
> to use it for real world projects.
Indeed, and they are doing a good job of keeping it state-of-the-art. (My package of
last Friday's 2.1.0 release was just uploaded...)
> > > Making that real-time for a timestepping simulation would be a great
> > > challenge.
> > What's interesting is that the time required to do a single (semi-)implicit
> > timestep can be much greater than the time required to repartition the mesh!
> > See for example a writeup of a Lagrangian-Eulerian model of blood cell
> > deformation in a flow field at:
> > http://www.cs.cmu.edu/~oghattas/papers/sc2000/sc2000.pdf
> Well, you should be ideally using an FM refinement over the partition
> of the last interval (why am I mentioning this? might be a good idea for
> a decent paper :P)
Don't worry, I won't scoop your paper, but can't speak for the rest of the list. :-)
BTW, what's FM refinement?
> > They actually repartition during each timestep! (Using a much more primitive
> > algorithm than ParMETIS.) But visualizing this should be pretty trivial, just
> > loop through the elements to generate triangles, just as I currently loop
> > through the finite difference grid as if it were made of linear hexahedral
> > elements. Unless I'm misunderstanding what your friend did...
> The specifics of the algorithms vary with their design. A strictly object
> space vis. algorithm would be very different from an image space algorithm.
> Now, if you'd just generate triangles, that would make a lot of triangles
> to send to a processor to draw. That's why I think making it real-time
> parallel is difficult. How will you render it? Even if you want to use
> OpenGL at the display node, then you will have to do some preprocessing
> on those triangles (by considering resolution, shading, etc.) before you
> send them over to the display node.
Indeed. As mentioned above, it currently relies on the display node to do quite a
lot. Accelerated OpenGL is quite fast these days, but even so, it doesn't scale. :-)
So here's an object space algorithm outline:
* Each node is told to draw one or more (think stereo, or lens array) RGBA bitmap
image(s) of its piece of the model. The size of the image(s) is/are fixed by
the size of the canvas at node 0, and the (per-image) transformation is
specified by the user.
* The nodes draw their pieces of the image(s), and send the (lossless-compressed?)
results to node 0.
* Node 0 then layers the input from each node to produce a composite image.
The problem with this is that if there's any curvature to the boundaries between the
regions on the nodes, then one cannot strictly order the subimages from the nodes,
that is, one node's data might end up both in front of and behind another's. So this
only works for a simple arrangement like a cartesian grid. Oh well.
How about a hybrid object-image space algorithm:
* Each node calculates the shading and location of the set of triangles
represented by its local data set for each image, based on the per-image
* We then partition the pixels of the image onto the various nodes (doing this
right- putting approximately equal numbers of triangles on each image node-
could be tricky), and redistribute the triangles from the data nodes where they
were calculated to the image nodes where they will be displayed. Note some
triangles will have to be sent to more than one image node (this could get messy
under some circumstances).
* The image nodes then layer and render the triangles, and send sub-images back to
the display node for final (trivially simple) assembly.
All of this works for a sort of luminance/absorption model, whether with contour
surface triangles or volume "fog" type of visualization (which I think Data Explorer
is based on). This much is pretty straightforward. If we want to compute shadows as
well, or do raytracing, things get a lot more complicated and a lot slower...
But fun things like drawing color tubes representing streamlines could fit into this
model without much hassle.
> That's why my friend's thesis was half-theoretical. You first partition
> the hypergraph model on a single proc. with Umit's hypergraph's partitioner,
> *then* you distribute the data and render. Of course, the scalability
> is not that great. :)
True. This is why I was thinking ParMETIS for the repartitioning of each timestep,
which you have to do for the calculation anyway.
> And just out of curiosity: What will you be using this visualization for?
This will be used for a few things:
* Real-time feedback from the simulation, as a precursor to real-time control.
* Complex thermal/fluid/multi-phase flow education: students develop a "feel" for
complex fluid behavior by remotely interacting in real time with a simulation on
a Beowulf cluster.
* Making pretty movies for conference presentations and electronic publications.
> Because I'd tried my hand on extending Overblown to PetSC on MPI. If that's
> CFD you're working on, it might be a nice place to look at because Overblown
> is a really cute framework.
I've never heard of Overblown, what's that? A Google search for "overblown
visualization" didn't turn up anything...
Thanks again for the feedback, this is fun!
GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6
Welcome to the best software in the world today cafe!