Re: NanoXML parser

To: debian-embedded@lists.debian.org
Subject: Re: NanoXML parser
From: Neil Williams <codehelp@debian.org>
Date: Fri, 15 Aug 2008 10:46:04 -0300
Message-id: <1218807964.2837.39.camel@dwarf.codehelp>
In-reply-to: <20080810102859.GT15696@tamay-dogan.net>
References: <20080801205838.GB2153@tamay-dogan.net> <1218200407.703.26.camel@dwarf.codehelp> <20080810102859.GT15696@tamay-dogan.net>

On Sun, 2008-08-10 at 12:28 +0200, Michelle Konzack wrote:
> Since I have only 8 MByte of FLASH, libxml2 is definitively to big since
> it support functionalities I never need in embedded applications..

XML is a very, very, very verbose format. Any data increases in size
(usually by a factor between 2 and 10) merely by being converted into
XML when compared to almost any other text-based format.

Personally, I try very hard to avoid XML on any resource-limited
platform. (I'm normally a strong advocate of XML and I've written a few
XML formats myself, with schemas etc.)

New applications might be better off using a custom binary backend and
supporting export formats over remote connections or with a client that
is installed on a more powerful machine so that the binary file can be
synced and then managed/exported/converted.

> > SQLite is faster than XML, it is more memory efficient than XML, just as
> > portable, supports easy data conversion into a variety of other formats
> > and is, IMHO, the defacto standard for embedded system data storage.
> 
> But you can not install SQLite in a 1-Wire Chip...  ;-)

Then it makes absolutely no sense to use XML that is inherently far more
verbose than SQlite. If there is no room for the sqlite library, there
is no room for XML data of any reasonable data size.

Maybe use a config file or key:value type format?

[mystruct]
id=9
name=string
etc.

The comparative XML is:
<?xml version="1.0"?>
<mystruct>
<id>9</id>
<name>string</name>
</mystruct>

By my estimation, that is at least a three fold increase in data size,
just counting characters.

First hit on Google for a ini file parser gave:
http://ndevilla.free.fr/iniparser/
The source tarball is only 25kb.
"iniParser is a simple C library offering ini file parsing services. The
library is pretty small (less than 1500 lines of C) and robust, and does
not depend on any other external library to compile. It is written in
ANSI C and should compile on most platforms without difficulty."

> > It is empirically "for Java" and hasn't been updated since 2003. It has
> > been around all that time and nobody has considered packaging it for
> > Debian yet. Choosing to use it could easily mean that you effectively
> > become the current upstream maintainer as well as packaging maintainer.
> 
> But AFAIK it is bugfree.

Sorry, Michelle, that is just tosh. :-) Packages that have been dead
upstream for 5 years are already likely to be suffering from bitrot.
Once you bring such zombie code to the Debian autobuilders, a whole set
of new bugs can arise - especially if the package tries all kinds of
neat tricks to reduce size at the expense of portability. You might not
need your package to build on mips, you'll still get bugs if it fails to
build.

iniparser has a ChangeLog up to:
2008-01-03 19:42  ndevilla

> Maybe I a will pack it for me and of course Debian.

I just think that a small XML parser is a bit of an oxymoron - XML
simply isn't a small format. Having a small binary that produces
probably one of the most verbose formats so far invented isn't exactly
logical. That could be one reason why nobody packaged it for Debian when
it was actively maintained 5 years ago.

> How much diskspace does libglib consume on EmDebian?

More than libsqlite0. Thereagain, there is no GNOME-based GUI (including
GPE) that does not use libglib2.0-0. Xorg itself doesn't need libxml2,
it's Gtk+ that brings it in.

> > "The "GMarkup" parser is intended to parse a simple markup format that's
> > a subset of XML. This is a small, efficient, easy-to-use parser. It
> > should not be used if you expect to interoperate with other applications
> > generating full-scale XML. However, it's very useful for application
> 
> OK; this is not the case since NanoXML != XML.
> They have different goals.

Explain - If NanoXML doesn't produce XML, why did it use that name? You
can't make XML into a small format, the very nature of XML is to be
verbose. Every string needs extra characters to surround it, every piece
of data needs descriptive metadata.

> > data files, config files, etc. where you know your application will be
> > the only one writing the file. Full-scale XML parsers should be able to
> > parse the subset used by GMarkup, so you can easily migrate to
> > full-scale XML at a later time if the need arises."
> 
> It seems I have to look into GMarkup...

Or CSV. ;-)

> > With this in mind, newly written code - IMHO - should be targetted at a
> > small XML parser that explicitly supports later migration to a more
> > capable XML parser and which is already available in Debian and existing
> > package sets. Porting existing code to GMarkup or nanoxml is a
> > non-trivial exercise.
> > 
> > libxml2 is a 1Mb shared library package in Emdebian.
> 
> Ops, it 3 times bigger then my Kernel...

In which case, you could easily end up with the XML data files also
being bigger than your kernel when the equivalent SQLite data would be
about 10x smaller than the XML. Even CSV would be probably a third of
the size of the XML file. Do you really want the CPU burden of runtime
compression with gzip?

-- 

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

References:
- NanoXML parser
  - From: Michelle Konzack <linux4michelle@tamay-dogan.net>
- Re: NanoXML parser
  - From: Neil Williams <codehelp@debian.org>
- Re: NanoXML parser
  - From: Michelle Konzack <linux4michelle@tamay-dogan.net>

Prev by Date: Re: Emdebian 1.0 ?
Next by Date: Re: Emdebian 1.0 ?
Previous by thread: Re: NanoXML parser
Next by thread: [i386] - finally emsecondstage
Index(es):
- Date
- Thread