[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [asmlib] - Library of optimized subroutines coded in assembly language.



Hey Andreas,

On Thu, Nov 6, 2014 at 10:53 AM, Andreas Tille <andreas@an3as.eu> wrote:
>
> Upstream for Debian means a temporary upstream placeholder to use for
> package building, whilst the KMC authors have a discussion with the asmlib
> author, to try to get him to distribute his code in a VCS manner.
> If you look at [1], the code is distributed as a zip file that contains the
> compiled static libraries for several OS's and architectures. The source
> code is another zip file inside the original one. It contains a pdf plus
> other zip files. The source code itself uses a nmake makefile that performs
> cross compilation, through the use of yet another piece of code written by
> the author.

OK, I see.  However, I'd prefer a debian/get-orig-script fetching the
zipfile and extracting the source from there.

Sounds good. Looking into this now.
 

> There is no versioning system whatsoever for asmlib. The author simply
> states that the most up to date version is always available in [1].

There is a nice guide for upstream developers who tell you things like
this.  You should rather point upstream to

   https://wiki.debian.org/UpstreamGuide#Releases_and_Versions

than trying to fix things on behalf of upstream.

Thanks for the link. Having a look at it as well.
 
  BTW, I wonder what
amount of speed gain KMC authors are expecting from a library written in
assembly?  These days compilers are really optimised.  If I see source
files like memcmp64.asm, memcpy64.asm etc I *really* wonder what I
should expect from an author who fails to comply to some basic rules
like releasing versioned tarballs.  If *I* would try to develop a piece
of software I would not rely on such code, sorry.


It was my initial thought.
Here's what they had to say about it:

Jorge said:
I am not completely sure, having read the readme.txt file provided with the source distribution, whether the source code also includes these libraries:
* asmlib - for fast memcpy operation (http://www.agner.org/optimize/asmlib-instructions.pdf)
* libbzip2 - for support for bzip2-compressed input FASTQ/FASTA files (http://www.bzip.org/)
* zlib - for support for gzip-compressed input FASTQ/FASTA files (http://www.zlib.net/)
 
KMC said:
These libraries are provided with KMC source codes in a binary form. You can find them here:
The ones with "lib" extension are for windows compilation (Visual Studio compiler) and the ones with "a" extension are for linux compilation (g++ compiler).

Jorge said:

I know that zlib is a available as a debian package and it will be easy to simply state it as a dependency for KMC.
Running apt-cache search for libbzip2 I can see the following:
js21@builder2:~/deb_alioth/current/kmc_packaging/temp/kmc$ apt-cache search libbzip2
lib32bz2-1.0 - high-quality block-sorting file compressor library - 32bit runtime
libbz2-1.0 - high-quality block-sorting file compressor library - runtime

I'm not sure whether this is the library you generally use, but I'm sure it would do the same job. We can again state this as a dependency of the KMC package and it will be installed in the user's system at KMC build time
The library that raises a flag for me is the asmlib. I can find a libasm through apt-cache, but this seems to be referenced as a library for Java. (crazy java)
If this library is essential for KMC, then we will need to ask the Debian Med list if this is already available through Debian. If not, we will probably have to create a package for it as well.

KMC said:

Asmlib is essential in performance sense. In KMC there is a lot of work with buffers, sometimes these buffers or its part must be copied and asmlib provide quicker copying than standard implementation.

I am not really sure how debian packages works, but if it is possible to provide binary libraries inside package, KMC should work without any dependencies.

END_OF_KMC_JORGE_TALK

 
> > I do not see any point for not using straight upstream
> > repository.
>
> I am hoping that the KMC authors will convince the asmlib author to
> distribute his code through a VCS.

In the sense what I wrote above I personally would be happy if KMC
authors would use plain C/C++ code and do some serious testing what
speed they really gain and how maintainable their code would be.

I will see if I can get some evidence form the KMC authors as to why they choose this implementation, rather than native.
 

> >   It seems Git addicts have a lot of fun by cloning things.
>
> I think you're assuming too much here.

Yes, you are right.  I should have checked in advance.  Sorry.

It's OK. Thanks for saying that.
 

> In my view, if the upstream author sees the simplicity of what I have done,
> I believe it would be easier to convince him/her to use whatever VCS. In my
> case Git, because I am now used to it.

OK, that's a valid point.

+1
 

> >     git import-orig --pristine-tar
> >
> > since the repository has no upstream branch.  Please try to follow our
> > zeam policy as closely as possible.  Otherwise your coworkers will have
> > trouble to
>
> I have used it I think I just forgot to push it. It's now pushed for asmlib.

OK, confirmed that I was able to pull this.

Sweet!
 

> > > genomes, not as in machine code) IVA written in python by a member of my
> > > team over at the Wellcome Trust Sanger Institute.
> > >
> > > [7] Original upstream - https://github.com/sanger-pathogens/iva
> >
> > Sounds good!
>
> At least something...

Well, it seems my murmuring was a bit depressing to you.  This was not
intended.  May be the unusual way of source distribution has fired back
to you since I did not expected this kind of trouble and was not
checking properly.

Don't worry about it. It's all a dynamic process and it's not that it was depressing. You had no clear picture of what was in my head.
I also thought this task would be much simpler than what it's turning out to be.
I would like to see it through though.
 

> I'll work through quilt patches.
> I only removed code to see if I could build the package in my machine.
> And since the nmake makefile is completely useless with make + the only
> libraries useful for Debian would be the 32bit and and 64bit Unix libraries
> + I'm not making any changes to the original source file, simply adding a
> Unix Makefile, I thought this would be a simple solution.

Here we are facing another drawback of asmlib usage:  KMC authors are
excluding promising architectures like arm64 and ppc64el which might in
the not so distant future could become relevant for tasks in
bioinformatics.

Very valid point. I will bring this point accross on the next communication.
I can cc you if you want. If you want to talk to them directly, that could also help.
The two guys I've been talking to are:

KMC developer - Marek.Kokot@polsl.pl
KMC coordinator (I think) - sdeorowicz@gmail.com
 

> > Yes, for sure.  I think the decision to package asmlib separately was
> > drawn in the beginning of this discussion.
> >
>
> This email was sent to both Debian Med and Debian Mentors. I thought it
> best to be as verbose as I could be.

I missed this and I'm now CCing debian-mentors as well.


Cool!
 
Summary: I would try to discuss with KMC developers whether they would
see any chance to make amslib optional and could provide the full
functionality without this library.  Writing assembly language is to the
best of my knowledge something you did in the 90th of last century.
Trying to reimplement things like memcmp, memcpy etc is something you
should avoid IMHO.


100% agreed. As above, approached the subject, and there were arguments for using these reimplementations.
I'll send them an email later today. They are busy writing the next version of KMC and have become slightly irresponsive as of late.
Fingers crossed.

Kind regards,

Jorge


Reply to: