[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OTF conversion without OpenOffice



On Thu, 01 Jul 2010 10:03:28 -0400, brownh wrote:

> Camaleón writes:
> 
>> (...)
>>
>> If it's a simple file (just plain text) you can extract (unzip) the
>> .docx into *.xml data for a direct view or convert into another
>> suitable format.
> 
> Camaleón, I'm afraid you lost me. The file was .docx, which looks
> binary. As a result, it's MIME'd in the mail message, which makes it
> plain ASCII.

I was referring to the "content" of the .docx file, not the "nature" of 
it :-).

If there are images or tables, it will be difficult to render them in the 
xml file (images would be linked and tables would need a parser). But if 
the .docx file just cointains a bunch of text, it can be easily readable 
from the resulting xml file.
 
> But apparently you mean that I can run unzip on the .docx file to
> extract *.xml data. This was news to me, for I had no idea that .docx
> was an archive. But I tried it, and a number of things happened. It
> created an empty _rels directory; it created a docProps directory in
> which are app.xml and core.xml, and it created a word/ directory in
> which there are a number of *.xml files. None of these xml files are
> understood by abiword.

Yep. MS ".docx" format is far from ".odt" flexibility but it shares some 
features. One if that the files are compressed and can be easily 
extracted for raw reading.

"document.xml" is the main file, the one that contains the text of the 
document. And being a xml file, it can be read with any editor (console 
or GUI based) or any browser because is just plain text. Of course, do 
not expect to get the same shape you get with the ".docx" file when 
opened with a text processor, but at least you can view the content of 
the file :-) 

Greetings,

-- 
Camaleón


Reply to: