Your message dated Tue, 02 Jan 2024 11:06:19 -0600 with message-id <1847787.VLH7GnMWUR@riemann> and subject line Re: ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input has caused the Debian Bug report #1009992, regarding ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 1009992: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009992 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input
- From: Vincent Lefevre <vincent@vinc17.net>
- Date: Fri, 22 Apr 2022 00:38:38 +0200
- Message-id: <20220421223838.GA56342@zira.vinc17.org>
Package: ghostscript Version: 9.56.1~dfsg-1 Severity: normal Tags: upstream Forwarded: https://bugs.ghostscript.com/show_bug.cgi?id=705246 When an input PDF file has a character like U+2308 LEFT CEILING and has a ToUnicode CMap, the new PDF interpreter may yield an incorrect ToUnicode CMap in the generated PDF. The issue seems to be limited to characters like math symbols (in the same font as the problematic character?), though; letters, including accented ones, do not seem to be affected. Here's a shell script used for some testing: ──────────────────────────────────────────────────────────────────────── #!/bin/sh set -e out() { echo -n "$i$j ($1):" printf " %s" $(pdftotext chartest9$i$j$2.pdf - | tr -d '\f') echo } for i in a b do for j in 0 1 do cat <<'EOF' | sed "s/:$i/\\\\lceil/" | \ sed "s/:a//" | \ sed "s/J/$j/" > chartest9.tex \documentclass{article} \usepackage[T1]{fontenc} \usepackage{lmodern} \pdfgentounicode=J \begin{document} \thispagestyle{empty} $\in:a$ \end{document} EOF pdflatex chartest9.tex > /dev/null mv chartest9.pdf chartest9$i$j.pdf out "pdfTeX" "" ps2pdf14 chartest9$i$j.pdf chartest9$i$j-new.pdf out "gs new" "-new" ps2pdf14 -dNEWPDF=false chartest9$i$j.pdf chartest9$i$j-old.pdf out "gs old" "-old" done done ──────────────────────────────────────────────────────────────────────── See the upstream bug for the obtained PDF files. 4 kinds of PDF inputs are tested (a0, a1, b0, b1), where * a: the content corresponds to "∈⌈" (ELEMENT OF + LEFT CEILING) * b: the content corresponds to "∈" (ELEMENT OF) * 0: \pdfgentounicode=0 (pdfTeX does not generate a ToUnicode CMap) * 1: \pdfgentounicode=1 (pdfTeX generates a ToUnicode CMap) I've compared (see above script for details): * pdfTeX: PDF file generated by pdfTeX from TeX Live 2022 * gs new: PDF file obtained with the new PDF interpreter (default) * gs old: PDF file obtained with the old PDF interpreter (dNEWPDF=false) I've done the tests with the ghostscript 9.56.1~dfsg-1 Debian package. If LEFT CEILING is not present, Ghostscript does not generate a ToUnicode CMap in all of these cases, which is fine. But if this character is present: 1. With the old PDF interpreter, Ghostscript generates a correct ToUnicode CMap. 2. With the new PDF interpreter and no input ToUnicode CMap, Ghostscript does not generate a ToUnicode CMap (the only practical issue is that one cannot get unual characters like LEFT CEILING, but this is not worse than what TeX Live 2022 can yield in any case). 3. With the new PDF interpreter and an input ToUnicode CMap like the one from TeX Live 2022, Ghostscript generates an incorrect ToUnicode CMap, which prevents one from getting usual math characters such as ELEMENT OF. The results, where I've added ToUnicode CMap information (which I have obtained with "qpdf --stream-data=uncompress" on these PDF files): a0 (pdfTeX): ∈d (no CMap) a0 (gs new): ∈d (no CMap) a0 (gs old): ∈⌈ (CMap old) a1 (pdfTeX): ∈d (CMap 1) a1 (gs new): (CMap 1-new) a1 (gs old): ∈⌈ (CMap old) b0 (pdfTeX): ∈ (no CMap) b0 (gs new): ∈ (no CMap) b0 (gs old): ∈ (no CMap) b1 (pdfTeX): ∈ (CMap 1) b1 (gs new): ∈ (no CMap) b1 (gs old): ∈ (no CMap) with the following ToUnicode CMaps: CMap old: ──────────────────────────────────────── begincmap /CMapType 2 def /CMapName/R11 def 1 begincodespacerange <00><ff> endcodespacerange 2 beginbfrange <32><32><2208> <64><64><2308> endbfrange endcmap ──────────────────────────────────────── CMap 1: ──────────────────────────────────────── begincmap /CIDSystemInfo << /Registry (TeX) /Ordering (lmsy10-lm-mathsy) /Supplement 0 >> def /CMapName /TeX-lmsy10-lm-mathsy-0 def /CMapType 2 def 1 begincodespacerange <00> <FF> endcodespacerange 0 beginbfrange endbfrange 0 beginbfchar endbfchar endcmap ──────────────────────────────────────── CMap 1-new: ──────────────────────────────────────── begincmap /CMapType 2 def /CMapName/R11 def 1 begincodespacerange <00><ff> endcodespacerange 2 beginbfrange <32><32><00> <64><64><00> endbfrange endcmap ──────────────────────────────────────── -- System Information: Debian Release: bookworm/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 5.17.0-1-amd64 (SMP w/8 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages ghostscript depends on: ii libc6 2.33-7 ii libgs9 9.56.1~dfsg-1 ghostscript recommends no packages. Versions of packages ghostscript suggests: ii ghostscript-x 9.56.1~dfsg-1 -- no debconf information -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
--- End Message ---
--- Begin Message ---
- To: 1009992-done@bugs.debian.org
- Subject: Re: ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input
- From: Steven Robbins <steve@sumost.ca>
- Date: Tue, 02 Jan 2024 11:06:19 -0600
- Message-id: <1847787.VLH7GnMWUR@riemann>
- In-reply-to: <20220421223838.GA56342@zira.vinc17.org>
On Fri, 22 Apr 2022 00:38:38 +0200 Vincent Lefevre <vincent@vinc17.net> wrote: > Package: ghostscript > Version: 9.56.1~dfsg-1 > Severity: normal > Tags: upstream > Forwarded: https://bugs.ghostscript.com/show_bug.cgi?id=705246 The upstream bug report includes a note from Vincent that it is fixed in version 10.Attachment: signature.asc
Description: This is a digitally signed message part.
--- End Message ---