Re: Packaging Mauve and libClustalW
Hello all,
some replies below...
Andreas Tille wrote:
On Thu, 13 Mar 2008, Robert Edgar wrote:
Sounds like you're suggesting I maintain a library version of muscle,
i.e.
something that can be linked with a C/C++ program or whatever. I'm
open to
discussing this, but at first glance it strikes me as being a /lot/
of work
for the benefit of only a handful of users.
Well, Aaron has obviousely experience in doing so and I have converted
a couple of projects to build a library using libtool as well. So this
is our offer to take over the grunt work of the conversion.
Yes, I would put forward that most of the hard work in refactoring
muscle into a library has already been done. It's true that there
aren't (yet) many users of muscle-as-a-library, but if you count all the
end-users of Mauve who indirectly derive benefit from having
muscle-as-a-library then you would be counting quite a few users.
It would require a lot of
documentation for my (ugly, badly designed) functions and classes
Well, the fact that Aaron used it as a lib is proof enough that it is not
that bad. ;-)
hehe, yep, it's certainly better than my spaghetti code.
Regarding the documentation doxygen can take over a reasonable amount.
Those users who do not understand will just use your executable and those
who would like to understand will read the code.
I can add a Doxyfile to the library and script automatic generation of
the docs as I have done for my other projects.
and would
do bad things like pollute the global namespace unless I did a lot of
cleaning up.
In how far you expect a name space pollution?
Namespace pollution can be solved rather trivially by putting all the
muscle code inside its own namespace. Doing so would involve adding
something like
namespace muscle {
to the top of every header file
and adding something like:
using namespace muscle;
to the top of every source file.
I am happy to take care of such details.
I'd need to understand why fork() is not a solution. If fork()
isn't practical for some reason then a pretty general hack might be to
re-name my main() to MUSCLE_main() then call MUSCLE_main() from your
program
with appropriate argc, argv. I do my best to keep backwards
compatibility
with command-line options for the benefit of people with scripts. It
should
be straightforward to capture output from muscle by reading output
files or
via a pipe; if there are problems with this then that might be a
reasonable
thing for me to fix & maintain in future versions. There is a good
chance
that memory leaks will be a problem; I don't have to care about freeing
everything if main() is called only once. I'm not sure how much time
I'd be
willing to invest in fixing that, but if someone (Aaron?) wanted to
take on
the challenge of using a memory leak tool to find all the leaks then
I might
be willing to commit to fixing it & keeping it clean going forward.
In fact I had been using fork/exec with muscle for quite a while. There
are several reasons why that approach became unappealing. First, the
system overhead of launching a separate muscle process becomes quite
large when many thousand such processes must be launched in a short
period of time. Second, some parameters such as a custom substitution
matrix required a file input. Third, users would occasionally forget to
move the muscle binary around with the mauve aligner binaries which
depend on it. If muscle isn't in the search path or in the same
directory as the binaries which depend on it, it becomes practically
impossible to find and launch the executable. A related problem is
executable versioning. Libraries generally make versioning explicit in
a standardized way, whereas a muscle binary is usually just called
"muscle" whether it's version 3.52 or 3.6 or 3.7.
I think we all agree that refactoring muscle into a library is/was no
small task. Originally I did not want to do it because I knew it would
be tedious, but eventually the potential benefits outweighed the costs.
In the process of doing so, I did indeed find many memory leaks and even
references to unitialized memory, and have fixed all that I found. We
ran the program through valgrind to detect memory errors. I can't
guarantee the program is error free with respect to memory issues, but
at least the code paths used by Mauve are so.
I have also added a few other little features like anchored
profile-profile alignment, fixed some bugs in the -refinew option, and
probably did some other things that I'm not immediately recalling.
If you (Robert) are amenable to the idea, I can send you a patch against
with my changes for you to review.
-Aaron
Reply to: