[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Microsoft Does It Again


On Tue, Aug 21, 2018 at 06:28:57PM +0200, tomas@tuxteam.de wrote:
> On Tue, Aug 21, 2018 at 07:02:32PM +0300, Reco wrote:
> > On Tue, Aug 21, 2018 at 05:48:31PM +0200, tomas@tuxteam.de wrote:
> [...]
> > >   tomas@trotzki:~$ apt search ooxml
> > >   Sorting... Done
> > >   Full Text Search... Done
> > >   docx2txt/stable,stable,stable 1.4-0.1 all
> > >     Convert Microsoft OOXML files to plain text
> > 
> > Not relevant. Input is xlsx.
> Well, xlsx *is* OOXML (I like to call it "MOOXML" as in
> "Microsoft's..." -- you get the idea :)

That's like saying that apples and oranges are both fruits. 
I.e. that's truth, but one does not compare apples to oranges usually.

Both docx and xlsx are zip archives with xml inside. Their parsing is
different, and applying parsing rules from one to another yields no
useful result.

Parsing docx is easy, even I can do it (and did it, actually).
Parsing xlsx with all its gross formulas (sp?), macros and arcane date
formats is the definition of pain. I gave it up and became a happy
xlsx2csv user.


Reply to: