[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packaging Mauve and libClustalW



Hello all,
some replies below...

Andreas Tille wrote:
On Thu, 13 Mar 2008, Robert Edgar wrote:

Sounds like you're suggesting I maintain a library version of muscle, i.e. something that can be linked with a C/C++ program or whatever. I'm open to discussing this, but at first glance it strikes me as being a /lot/ of work
for the benefit of only a handful of users.

Well, Aaron has obviousely experience in doing so and I have converted
a couple of projects to build a library using libtool as well.  So this
is our offer to take over the grunt work of the conversion.
Yes, I would put forward that most of the hard work in refactoring muscle into a library has already been done. It's true that there aren't (yet) many users of muscle-as-a-library, but if you count all the end-users of Mauve who indirectly derive benefit from having muscle-as-a-library then you would be counting quite a few users.


It would require a lot of
documentation for my (ugly, badly designed) functions and classes

Well, the fact that Aaron used it as a lib is proof enough that it is not
that bad. ;-)

hehe, yep, it's certainly better than my spaghetti code.

Regarding the documentation doxygen can take over a reasonable amount.
Those users who do not understand will just use your executable and those
who would like to understand will read the code.

I can add a Doxyfile to the library and script automatic generation of the docs as I have done for my other projects.


and would
do bad things like pollute the global namespace unless I did a lot of
cleaning up.

In how far you expect a name space pollution?

Namespace pollution can be solved rather trivially by putting all the muscle code inside its own namespace. Doing so would involve adding something like

namespace muscle {

to the top of every header file
and adding something like:

using namespace muscle;

to the top of every source file.

I am happy to take care of such details.


I'd need to understand why fork() is not a solution. If fork()
isn't practical for some reason then a pretty general hack might be to
re-name my main() to MUSCLE_main() then call MUSCLE_main() from your program with appropriate argc, argv. I do my best to keep backwards compatibility with command-line options for the benefit of people with scripts. It should be straightforward to capture output from muscle by reading output files or via a pipe; if there are problems with this then that might be a reasonable thing for me to fix & maintain in future versions. There is a good chance
that memory leaks will be a problem; I don't have to care about freeing
everything if main() is called only once. I'm not sure how much time I'd be willing to invest in fixing that, but if someone (Aaron?) wanted to take on the challenge of using a memory leak tool to find all the leaks then I might
be willing to commit to fixing it & keeping it clean going forward.

In fact I had been using fork/exec with muscle for quite a while. There are several reasons why that approach became unappealing. First, the system overhead of launching a separate muscle process becomes quite large when many thousand such processes must be launched in a short period of time. Second, some parameters such as a custom substitution matrix required a file input. Third, users would occasionally forget to move the muscle binary around with the mauve aligner binaries which depend on it. If muscle isn't in the search path or in the same directory as the binaries which depend on it, it becomes practically impossible to find and launch the executable. A related problem is executable versioning. Libraries generally make versioning explicit in a standardized way, whereas a muscle binary is usually just called "muscle" whether it's version 3.52 or 3.6 or 3.7.

I think we all agree that refactoring muscle into a library is/was no small task. Originally I did not want to do it because I knew it would be tedious, but eventually the potential benefits outweighed the costs.

In the process of doing so, I did indeed find many memory leaks and even references to unitialized memory, and have fixed all that I found. We ran the program through valgrind to detect memory errors. I can't guarantee the program is error free with respect to memory issues, but at least the code paths used by Mauve are so.

I have also added a few other little features like anchored profile-profile alignment, fixed some bugs in the -refinew option, and probably did some other things that I'm not immediately recalling.

If you (Robert) are amenable to the idea, I can send you a patch against with my changes for you to review.

-Aaron


Reply to: