[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#780276: python-ooolib: Error in cleaning method causes corrupted formulas in spreadsheets



Package: python-ooolib
Version: 0.0.17-2.1
Severity: normal
Tags: patch

The CalcSheet.clean_formula() method does some replacement of strings in
order to ensure proper syntax.  However, it replaces strings regardless
of their context, which means that some can be replaced twice.  The
result is corrupted formulas.  Here's an interactive example:

roland@polymir ~ $ python
Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ooolib
>>> doc = ooolib.Calc()
>>> doc.set_cell_value(1,20,"formula","=SUM(A1:A9)")
>>> doc.get_cell_value(1,20)
('formula', u'oooc:=SUM([.A1]:[.A9])')
>>> doc.set_cell_value(1,20,"formula","=SUM(A1:A10)")
>>> doc.get_cell_value(1,20)
('formula', u'oooc:=SUM([.A1]:[.[.A1]0])')

Note the error in the "cleaned" formula here: the A1 part gets replaced
when the A10 part has already been replaced, so we get extra brackets
and an invalid formula.  A fixed implementation uses word-limit markers
to make sure we only replace A1 (and not part of A10) with [.A1]:

>>> import re
>>> def my_clean_formula(self,data):
...     "Returns a formula for use in ODF"
...     # Example Translations
...     # '=SUM(A1:A2)'
...     # datavalue = 'oooc:=SUM([.A1:.A2])'
...     # '=IF((A5>A4);A4;"")'
...     # datavalue = 'oooc:=IF(([.A5]>[.A4]);[.A4];"")'
...     data = str(data)
...     data = clean_string(data)
...     redata = re.search('^=([A-Z]+)(\(.*)$', data)
...     if redata:
...         # funct is the function name.  The rest if the string will be the functArgs
...         funct = redata.group(1)
...         functArgs = redata.group(2)
...         # Search for cell lebels and replace them
...         reList = re.findall('([A-Z]+\d+)', functArgs)
...         # sort and keep track so we do not do a cell more than once
...         reList.sort()
...         lastVar = ''
...         while reList:
...             # Replace each cell label
...             curVar = reList.pop()
...             if curVar == lastVar: continue
...             lastVar = curVar
...             functArgs = re.sub('\b'+curVar+'\b', '[.%s]' % curVar, functArgs)
...         data = 'oooc:=%s%s' % (funct, functArgs)
...     return data
... 
>>> from ooolib import clean_string
>>> ooolib.CalcSheet.clean_formula = my_clean_formula
>>> doc.set_cell_value(1,20,"formula","=SUM(A1:A10)")
>>> doc.get_cell_value(1,20)
('formula', u'oooc:=SUM(A1:A10)')
>>>

In other words, the following one-line patch fixes the bug:

--- __init__.py.old	2015-03-11 15:13:22.978654241 +0100
+++ __init__.py	2015-03-11 15:13:25.686678190 +0100
@@ -665,7 +665,7 @@
 				curVar = reList.pop()
 				if curVar == lastVar: continue
 				lastVar = curVar
-				functArgs = functArgs.replace(curVar, '[.%s]' % curVar)
+                                functArgs = re.sub('\b'+curVar+'\b', '[.%s]' % curVar, functArgs)
 			data = 'oooc:=%s%s' % (funct, functArgs)
 		return data
 
  Thanks,

Roland.
-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=fr_FR.utf8, LC_CTYPE=fr_FR.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages python-ooolib depends on:
ii  python     2.7.8-4
ii  python2.7  2.7.9-2

python-ooolib recommends no packages.

python-ooolib suggests no packages.

-- no debconf information

-- 
Roland Mas

Indépendant en informatique libre -- Free software freelance
http://www.gnurandal.com/

Reply to: