Re: Packaging Mauve and libClustalW

To: Andreas Tille <tillea@rki.de>
Cc: Robert Edgar <bob@drive5.com>, "'Todd J Treangen'" <treangen@gmail.com>, Debian Med Project List <debian-med@lists.debian.org>
Subject: Re: Packaging Mauve and libClustalW
From: Aaron Darling <darling@cs.wisc.edu>
Date: Fri, 14 Mar 2008 11:06:13 +1000
Message-id: <[🔎] 47D9CF85.3050106@cs.wisc.edu>
In-reply-to: <[🔎] alpine.DEB.1.00.0803132217170.26913@wr-linux02>
References: <1025D17C85DC489D809E0109DE62A763@big> <[🔎] alpine.DEB.1.00.0803132217170.26913@wr-linux02>

Hello all,
some replies below...

Andreas Tille wrote:

On Thu, 13 Mar 2008, Robert Edgar wrote:
Sounds like you're suggesting I maintain a library version of muscle,i.e.something that can be linked with a C/C++ program or whatever. I'mopen todiscussing this, but at first glance it strikes me as being a /lot/of work
for the benefit of only a handful of users.
Well, Aaron has obviousely experience in doing so and I have converted
a couple of projects to build a library using libtool as well.  So this
is our offer to take over the grunt work of the conversion.

Yes, I would put forward that most of the hard work in refactoringmuscle into a library has already been done. It's true that therearen't (yet) many users of muscle-as-a-library, but if you count all theend-users of Mauve who indirectly derive benefit from havingmuscle-as-a-library then you would be counting quite a few users.

It would require a lot of
documentation for my (ugly, badly designed) functions and classes


Well, the fact that Aaron used it as a lib is proof enough that it is not
that bad. ;-)


hehe, yep, it's certainly better than my spaghetti code.

Regarding the documentation doxygen can take over a reasonable amount.
Those users who do not understand will just use your executable and those
who would like to understand will read the code.

I can add a Doxyfile to the library and script automatic generation ofthe docs as I have done for my other projects.

and would
do bad things like pollute the global namespace unless I did a lot of
cleaning up.


In how far you expect a name space pollution?

Namespace pollution can be solved rather trivially by putting all themuscle code inside its own namespace. Doing so would involve addingsomething like


namespace muscle {

to the top of every header file
and adding something like:

using namespace muscle;

to the top of every source file.

I am happy to take care of such details.

I'd need to understand why fork() is not a solution. If fork()
isn't practical for some reason then a pretty general hack might be to
re-name my main() to MUSCLE_main() then call MUSCLE_main() from yourprogramwith appropriate argc, argv. I do my best to keep backwardscompatibilitywith command-line options for the benefit of people with scripts. Itshouldbe straightforward to capture output from muscle by reading outputfiles orvia a pipe; if there are problems with this then that might be areasonablething for me to fix & maintain in future versions. There is a goodchance
that memory leaks will be a problem; I don't have to care about freeing
everything if main() is called only once. I'm not sure how much timeI'd bewilling to invest in fixing that, but if someone (Aaron?) wanted totake onthe challenge of using a memory leak tool to find all the leaks thenI might
be willing to commit to fixing it & keeping it clean going forward.

In fact I had been using fork/exec with muscle for quite a while. Thereare several reasons why that approach became unappealing. First, thesystem overhead of launching a separate muscle process becomes quitelarge when many thousand such processes must be launched in a shortperiod of time. Second, some parameters such as a custom substitutionmatrix required a file input. Third, users would occasionally forget tomove the muscle binary around with the mauve aligner binaries whichdepend on it. If muscle isn't in the search path or in the samedirectory as the binaries which depend on it, it becomes practicallyimpossible to find and launch the executable. A related problem isexecutable versioning. Libraries generally make versioning explicit ina standardized way, whereas a muscle binary is usually just called"muscle" whether it's version 3.52 or 3.6 or 3.7.

I think we all agree that refactoring muscle into a library is/was nosmall task. Originally I did not want to do it because I knew it wouldbe tedious, but eventually the potential benefits outweighed the costs.

In the process of doing so, I did indeed find many memory leaks and evenreferences to unitialized memory, and have fixed all that I found. Weran the program through valgrind to detect memory errors. I can'tguarantee the program is error free with respect to memory issues, butat least the code paths used by Mauve are so.

I have also added a few other little features like anchoredprofile-profile alignment, fixed some bugs in the -refinew option, andprobably did some other things that I'm not immediately recalling.

If you (Robert) are amenable to the idea, I can send you a patch againstwith my changes for you to review.


-Aaron

Reply to:

References:
- RE: Packaging Mauve and libClustalW
  - From: Andreas Tille <tillea@rki.de>

Prev by Date: RE: Packaging Mauve and libClustalW
Next by Date: pkg-emboss has been suppressed
Previous by thread: RE: Packaging Mauve and libClustalW
Next by thread: Re: Packaging Mauve and libClustalW
Index(es):
- Date
- Thread