[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re:Re: Help needed for pandas bug: Could anybody verify the suspicion that tzdata might have some influence?



The underlying issue (but not strictly a bug as the documentation specifically says not to do that - http://sources.debian.net/src/python-tz/2016.7-0.2/pytz/tzinfo.py/#L247 ) is that passing a pytz tzinfo to the datetime constructor uses its first listed offset, not its correct offset for that date:

>>> datetime.datetime(2017,4,1,tzinfo=pytz.timezone('Europe/London'))
datetime.datetime(2017, 4, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>)
>>> pytz.timezone('Europe/London').localize(datetime.datetime(2017,4,1))
datetime.datetime(2017, 4, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/London' BST+1:00:00 DST>)

This suggests the attached fix, but this has *not* been tested.

As for why it's only now showing up...there seems to be something weird going on where the initial 'LMT' entries of timezone definitions sometimes get loaded into pytz.timezone objects but often don't:
jessie>>> pytz.timezone('Asia/Tokyo')._tzinfos
{(datetime.timedelta(0, 32400), datetime.timedelta(0), 'JST'): <DstTzInfo 'Asia/Tokyo' JST+9:00:00 STD>, (datetime.timedelta(0, 32400), datetime.timedelta(0), 'JCST'): <DstTzInfo 'Asia/Tokyo' JCST+9:00:00 STD>, (datetime.timedelta(0, 36000), datetime.timedelta(0, 3600), 'JDT'): <DstTzInfo 'Asia/Tokyo' JDT+10:00:00 DST>}
sid>>> pytz.timezone('Asia/Tokyo')._tzinfos
{(datetime.timedelta(0, 32400), datetime.timedelta(0), 'JST'): <DstTzInfo 'Asia/Tokyo' JST+9:00:00 STD>, (datetime.timedelta(0, 36000), datetime.timedelta(0, 3600), 'JDT'): <DstTzInfo 'Asia/Tokyo' JDT+10:00:00 DST>, (datetime.timedelta(0, 33540), datetime.timedelta(0), 'LMT'): <DstTzInfo 'Asia/Tokyo' LMT+9:19:00 STD>}

--- pandas/tests/test_multilevel0.py	2017-04-01 23:02:44.659970299 +0100
+++ pandas/tests/test_multilevel.py	2017-04-01 22:55:29.031975195 +0100
@@ -84,9 +84,9 @@ class TestMultiLevel(tm.TestCase):
         # GH 7112
         import pytz
         tz = pytz.timezone('Asia/Tokyo')
-        expected_tuples = [(1.1, datetime.datetime(2011, 1, 1, tzinfo=tz)),
-                           (1.2, datetime.datetime(2011, 1, 2, tzinfo=tz)),
-                           (1.3, datetime.datetime(2011, 1, 3, tzinfo=tz))]
+        expected_tuples = [(1.1, tz.localize(datetime.datetime(2011, 1, 1))),
+                           (1.2, tz.localize(datetime.datetime(2011, 1, 2))),
+                           (1.3, tz.localize(datetime.datetime(2011, 1, 3)))]
         expected = Index([1.1, 1.2, 1.3] + expected_tuples)
         self.assertTrue(result.equals(expected))
 
@@ -104,9 +104,9 @@ class TestMultiLevel(tm.TestCase):
 
         result = midx_lv3.append(midx_lv2)
         expected = Index._simple_new(
-            np.array([(1.1, datetime.datetime(2011, 1, 1, tzinfo=tz), 'A'),
-                      (1.2, datetime.datetime(2011, 1, 2, tzinfo=tz), 'B'),
-                      (1.3, datetime.datetime(2011, 1, 3, tzinfo=tz), 'C')] +
+            np.array([(1.1, tz.localize(datetime.datetime(2011, 1, 1)), 'A'),
+                      (1.2, tz.localize(datetime.datetime(2011, 1, 2)), 'B'),
+                      (1.3, tz.localize(datetime.datetime(2011, 1, 3)), 'C')] +
                      expected_tuples), None)
         self.assertTrue(result.equals(expected))
 


Reply to: