[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Magic for OpenOffice (file)



On Tue, May 25, 2004 at 10:17:33AM -0500, Jacob S. wrote:
>On Tue, 25 May 2004 07:56:28 -0700
>keith@ahapala.net (Keith Nasman) wrote:
>
>> On Tue, May 25, 2004 at 09:47:16AM -0400, Gregory Seidman wrote:
>> > On Tue, May 25, 2004 at 08:33:19AM +0200, Magnus Therning wrote:
>> > } A quick search in Google didn't reveal any solution (only found
>> > one} reference, in Japanese).
>> > } 
>> > }  $ file -i file.sxw
>> > }  file.sxw: application/x-zip
>> > } 
>> > } It would be really nice if 'file' could give proper the correct
>> > type for} OpenOffice documents. Anyone who has an entry for
>> > /etc/magic that make} sit happen?
>> > 
>> > This is a deeper problem than just OOorg. There has been
>> > dissatisfaction with file's reporting for
>> > compressed/gzipped/bzipped/zipped files for a good long time, and
>> > the idea of having file actually decompress some of the data to get
>> > a more accurate result has come up in the past.
><snip>
>> > 
>> > Ultimately, I'd love to see it done, and I encourage you to get
>> > programming.
>> 
>> What's kind of ironic is that the first line of the files states the
>> MIME type in ASCII.
>> 
>> keith@r31:~/docs$ strings test.sxw | head -n1
>> mimetypeapplication/vnd.sun.xml.writerPK
>> keith@r31:~/docs$ strings test.sxc | head -n1
>> mimetypeapplication/vnd.sun.xml.calcPK
>
>Actually, only OO.org 1.1 declares its MIME type. Doing a strings
>test.sxw | head -n1 on a file created by OO.org 1.0 simply returns
>"content.xml". 
>
>That said though, I _prefer_ file showing it as a compressed file. After
>all, if I wanted to read the content, I would first need to uncompress
>it, or use a utility that will decompress text on the fly (zless comes
>to mind). Then there's also the fact that it's not just a single file
>compressed in a zip archive. If you run unzip on a OO.org 1.1 file, it
>extracts the following files:
>
>mimetype                
>content.xml             
>styles.xml              
>meta.xml                
>settings.xml            
>META-INF/manifest.xml
>
>Running unzip on a OO.org 1.0 file returns much the same results,
>_except_ it does not have the mimetype file in it.
>
>I suppose there might be room to expand the information file returns,
>such as what type of file is inside the zip file, but if the zip file
>has very many individual files in it, this could take forever, produce
>a lot of output, etc. So, I think file accomplishes it's goal; I run
>file to know what tool(s) I need to work with that file. The first tool
>needed is a decompression utility. Then I can run file again on the
>individual .xml files to see that they are "XML 1.0 document text".

Well, in this particular case I really don't care about the archiving
format of the file. To me a .sxw file is an OpenOffice.org file, not a
zip-archive. Unzipping it won't allow me to work with the file, starting
OO.o will.

The source of all this is a small python script I wrote that saves files
attached in incoming emails and replaces the attachment with a
message/external-body attachment. (I get quite a lot of emails with
attachments and many times it is more interesting to keep the email
longer than the attachment.)

In my attempts to get mutt to support external-body parts I thought of
a script something like this:

 handleext:
   #! /bin/sh
   my_file=$1
   metamail -b -c `file -ib ${my_file}` ${my_file}

And then put an entry like this in my mailcap:

 ~/.mailcap:
   message/external-body; ${HOME}/bin/ext_test %{name};\
           test=test %{access-type} = local-file

This would let me utilize the already existing mailcap entries for files
(I am quite happy with them, and use them extensively from mutt). The
problem is that this simple setup won't work with OO.o.

I have managed to get 'file' to recognise OO.o files, but it seems
'file' has a rather silly behaviour:

 1. The first match is chosen, and /etc/magic is loaded last, meaning
 that /etc/magic doesn't override things in the default magic files
 (/usr/share/misc/file/).
 2. Setting the MAGIC environment variable means that the default magic
 files won't be used at all.

These deficiances can of course be worked around with some clever
scripting.

/M

-- 
Magnus Therning                    (OpenPGP: 0xAB4DFBA4)
magnus@therning.org
http://magnus.therning.org/

I'm still freaked out about all this. I just wrote a (bleeping)
anthropology paper.
     -- Eric Raymond

Attachment: signature.asc
Description: Digital signature


Reply to: