Bug#1063723: marked as done (bullseye-pu: package pypdf2/1.26.0-4)

To: Jonathan Wiltshire <jmw@coccia.debian.org>
Subject: Bug#1063723: marked as done (bullseye-pu: package pypdf2/1.26.0-4)
From: "Debian Bug Tracking System" <owner@bugs.debian.org>
Date: Sat, 29 Jun 2024 10:52:09 +0000
Message-id: <[🔎] handler.1063723.D1063723.17196580701197075.ackdone@bugs.debian.org>
Reply-to: 1063723@bugs.debian.org
References: <E1sNVcQ-002bpM-7H@coccia.debian.org> <170767863801.133386.12142364819931190732.reportbug@Zini-1880>

Your message dated Sat, 29 Jun 2024 10:47:46 +0000
with message-id <E1sNVcQ-002bpM-7H@coccia.debian.org>
and subject line Released with 11.10
has caused the Debian Bug report #1063723,
regarding bullseye-pu: package pypdf2/1.26.0-4
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1063723: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063723
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: bullseye-pu: package pypdf2/1.26.0-4
From: Scott Kitterman <debian@kitterman.com>
Date: Sun, 11 Feb 2024 14:10:38 -0500
Message-id: <170767863801.133386.12142364819931190732.reportbug@Zini-1880>

Package: release.debian.org
Severity: normal
Tags: bullseye
User: release.debian.org@packages.debian.org
Usertags: pu

[ Reason ]
Fixes two no-DSA CVE issues.

[ Impact ]
Continued low level security risk.

[ Tests ]
None that are run in the Debian package or autopkgtest.

[ Risks ]
Risk is low.  Code changes are relatively simple and were provided by
upstream.  These same two patches were previously released for LTS with
no known issues.

[ Checklist ]
  [X] *all* changes are documented in the d/changelog
  [X] I reviewed all changes and I approve them
  [X] attach debdiff against the package in (old)stable
  [X] the issue is verified as fixed in unstable

[ Changes ]
  * Forward-port CVE fixes by LTS team
    - CVE-2023-36810: Quadratic runtime with malformed PDF missing xref marker.
    - Fix CVE-2022-24859:
        Sebastian Krause discovered that manipulated inline images can force
        PyPDF2, a pure Python PDF library, into an infinite loop, if a
        maliciously crafted PDF file is processed.

[ Other info ]
This may show up as an NMU (lintian things so), but I'm the maintainer
now for Testing/Unstable.  I chose not to update the maintainer fields
in this update to keep it minimal to address the issues.

Scott K

diff -Nru pypdf2-1.26.0/debian/changelog pypdf2-1.26.0/debian/changelog
--- pypdf2-1.26.0/debian/changelog	2020-01-19 03:08:58.000000000 -0500
+++ pypdf2-1.26.0/debian/changelog	2024-02-11 13:50:22.000000000 -0500
@@ -1,3 +1,14 @@
+pypdf2 (1.26.0-4+deb11u1) bullseye; urgency=medium
+
+  * Forward-port CVE fixes by LTS team
+    - CVE-2023-36810: Quadratic runtime with malformed PDF missing xref marker.
+    - Fix CVE-2022-24859:
+        Sebastian Krause discovered that manipulated inline images can force
+        PyPDF2, a pure Python PDF library, into an infinite loop, if a
+        maliciously crafted PDF file is processed.
+
+ -- Scott Kitterman <scott@kitterman.com>  Sun, 11 Feb 2024 13:50:22 -0500
+
 pypdf2 (1.26.0-4) unstable; urgency=medium
 
   * Remove Python 2 from build dependencies (closes: #937505).
diff -Nru pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch
--- pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch	1969-12-31 19:00:00.000000000 -0500
+++ pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch	2024-02-11 13:49:50.000000000 -0500
@@ -0,0 +1,50 @@
+From 82ee233ea82a40c626e95a191fe2d52c745db870 Mon Sep 17 00:00:00 2001
+From: dsk7 <jensg@posteo.de>
+Date: Sat, 23 Apr 2022 19:12:13 +0200
+Subject: MAINT: Quadratic runtime while parsing reduced to linear  (#808)
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When the PdfFileReader tries to find the xref marker, the readNextEndLine methods builds a so called line by reading byte-for-byte. Every time a new byte is read, it is concatenated with the currently read line. This leads to quadratic runtime O(n²) behavior as Python strings (also byte-strings) are immutable and have to be copied where n is the size of the file.
+For files where the xref marker can not be found at the end this takes a enormous amount of time:
+
+* 1mb of zeros at the end: 45.54 seconds
+* 2mb of zeros at the end: 357.04 seconds
+(measured on a laptop made in 2015)
+
+This pull request changes the relevant section of the code to become linear runtime O(n), leading to a run time of less then a second for both cases mentioned above. Furthermore this PR adds a regression test.
+---
+ PyPDF2/pdf.py | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/PyPDF2/pdf.py b/PyPDF2/pdf.py
+index 9979414..8b355e0 100644
+--- a/PyPDF2/pdf.py
++++ b/PyPDF2/pdf.py
+@@ -1930,7 +1930,7 @@ class PdfFileReader(object):
+     def readNextEndLine(self, stream):
+         debug = False
+         if debug: print(">>readNextEndLine")
+-        line = b_("")
++        line_parts = []
+         while True:
+             # Prevent infinite loops in malformed PDFs
+             if stream.tell() == 0:
+@@ -1957,10 +1957,10 @@ class PdfFileReader(object):
+                 break
+             else:
+                 if debug: print("  x is neither")
+-                line = x + line
+-                if debug: print(("  RNEL line:", line))
++                line_parts.append(x)
+         if debug: print("leaving RNEL")
+-        return line
++        line_parts.reverse()
++        return b"".join(line_parts)
+ 
+     def decrypt(self, password):
+         """
+-- 
+2.30.2
+
diff -Nru pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch
--- pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch	1969-12-31 19:00:00.000000000 -0500
+++ pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch	2024-02-11 13:49:50.000000000 -0500
@@ -0,0 +1,63 @@
+From: Markus Koschany <apo@debian.org>
+Date: Fri, 3 Jun 2022 08:12:01 +0200
+Subject: CVE-2022-24859
+
+Bug-Debian: https://bugs.debian.org/1009879
+Origin: https://github.com/py-pdf/PyPDF2/pull/740
+---
+ PyPDF2/pdf.py | 32 ++++++++++++++++++++++----------
+ 1 file changed, 22 insertions(+), 10 deletions(-)
+
+diff --git a/PyPDF2/pdf.py b/PyPDF2/pdf.py
+index 9979414..b55dfba 100644
+--- a/PyPDF2/pdf.py
++++ b/PyPDF2/pdf.py
+@@ -2723,11 +2723,25 @@ class ContentStream(DecodedStreamObject):
+         # left at beginning of ID
+         tmp = stream.read(3)
+         assert tmp[:2] == b_("ID")
+-        data = b_("")
++        data = BytesIO()
++        # Read the inline image, while checking for EI (End Image) operator.
+         while True:
+-            # Read the inline image, while checking for EI (End Image) operator.
+-            tok = stream.read(1)
+-            if tok == b_("E"):
++            # Read 8 kB at a time and check if the chunk contains the E operator.
++            buf = stream.read(8192)
++            # We have reached the end of the stream, but haven't found the EI operator.
++            if not buf:
++                raise utils.PdfReadError("Unexpected end of stream")
++            loc = buf.find(b_("E"))
++
++            if loc == -1:
++                data.write(buf)
++            else:
++                # Write out everything before the E.
++                data.write(buf[0:loc])
++
++                # Seek back in the stream to read the E next.
++                stream.seek(loc - len(buf), 1)
++                tok = stream.read(1)
+                 # Check for End Image
+                 tok2 = stream.read(1)
+                 if tok2 == b_("I"):
+@@ -2744,14 +2758,12 @@ class ContentStream(DecodedStreamObject):
+                         stream.seek(-1, 1)
+                         break
+                     else:
+-                        stream.seek(-1,1)
+-                        data += info
++                        stream.seek(-1, 1)
++                        data.write(info)
+                 else:
+                     stream.seek(-1, 1)
+-                    data += tok
+-            else:
+-                data += tok
+-        return {"settings": settings, "data": data}
++                    data.write(tok)
++        return {"settings": settings, "data": data.getvalue()}
+ 
+     def _getData(self):
+         newdata = BytesIO()
diff -Nru pypdf2-1.26.0/debian/patches/series pypdf2-1.26.0/debian/patches/series
--- pypdf2-1.26.0/debian/patches/series	2016-09-05 13:14:14.000000000 -0400
+++ pypdf2-1.26.0/debian/patches/series	2024-02-11 13:49:50.000000000 -0500
@@ -1 +1,3 @@
 Prevent_infinite_loop_in_readObject.patch
+CVE-2022-24859.patch
+0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch

--- End Message ---

--- Begin Message ---

To: 1063723-done@bugs.debian.org

Subject: Released with 11.10

From: Jonathan Wiltshire <jmw@coccia.debian.org>

Date: Sat, 29 Jun 2024 10:47:46 +0000

Message-id: <E1sNVcQ-002bpM-7H@coccia.debian.org>
Version: 11.10

The upload requested in this bug has been released as part of 11.10.
--- End Message ---

Reply to:

Prev by Date: Bug#1063675: marked as done (bookworm-pu: package nvidia-graphics-drivers/525.147.05-6~deb12u1)
Next by Date: Bug#1063737: marked as done (bookworm-pu: package nvidia-graphics-drivers-tesla-470/470.223.02-4~deb12u1)
Previous by thread: Bug#1063675: marked as done (bookworm-pu: package nvidia-graphics-drivers/525.147.05-6~deb12u1)
Next by thread: Bug#1063737: marked as done (bookworm-pu: package nvidia-graphics-drivers-tesla-470/470.223.02-4~deb12u1)
Index(es):
- Date
- Thread