[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: MIME and Applied Biosystems chromatograms.



Am Montag, den 21.01.2008, 18:48 +0900 schrieb Charles Plessy:
> Le Mon, Jan 21, 2008 at 04:44:52AM +0100, Daniel Leidert a écrit :

[..]
> > Ok. Here something comes to my mind I forgot to tell you. Vendors are
> > allowed to use the "vnd." prefix (also written in one of the related
> > MIME RfCs). E.g. check /usr/share/mime/packages/freedesktop.org.xml for
> > this prefix. For example OO.o uses it. So I think, maybe a better name
> > would be:
> > 
> > application/vnd.appliedbiosystems-abif
> 
> I have updated the files to use application/vnd.appliedbiosystems.abif.
> However, I am unsure that this will be the definitive solution as, after
> checking the difference between .fsa ("fragment analysis") and .ab1
> (chromatograms) files, I have the impression that they contain different
> data.

Ok, so you would probably go for:

application/vnd.appliedbiosystems.abi
application/vnd.appliedbiosystems.fsa

> So althouh their encoding is the same, application that can open
> .ab1 chromatograms usually can not process .fsa files. Worse, some
> persons apparently used .fsa suffix for FASTA sequence files.

Yes, true. But Fasta and this one can be detected easily. FASTA are
plain text files and Applied Biosystems .fsa files are binary files.
Detection routines should be able to divide between them just because of
the sub-class-of text/plain tag for FASTA.

> For the moment, the XML file detects .ab1 chromatograms by extension,
> and ABIF files by magic. If we do not manage to distinguish ABIF
> chromatograms from ABIF fragment analysis files, I am affraid that the
> best solution would be to remove the magic detection ?

Not sure. Best would be, if there is some kind of magic byte (sequence)
to divide between these formats. However, there doesn't seem to be a SDK
or something, that contains the necessary information. The PDF just
tells about the ABIF format.

> Can we look for chains of characters within the whole file without
> compromising the speed of the detection ?

Yes. But the search routines are limited to a search between two given
offsets. There is no syntax for "begin" and "end" of a file.

> It seems that some "tags" in
> the ABIF files can be unique to either .fsa or .ab1 files.

Maybe the EMBOSS people know of magic bytes to divide between .abi
and .fsa files?

Regards, Daniel


Reply to: