[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1009992: marked as done (ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input)



Your message dated Tue, 02 Jan 2024 11:06:19 -0600
with message-id <1847787.VLH7GnMWUR@riemann>
and subject line Re: ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input
has caused the Debian Bug report #1009992,
regarding ghostscript: new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1009992: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009992
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: ghostscript
Version: 9.56.1~dfsg-1
Severity: normal
Tags: upstream
Forwarded: https://bugs.ghostscript.com/show_bug.cgi?id=705246

When an input PDF file has a character like U+2308 LEFT CEILING and
has a ToUnicode CMap, the new PDF interpreter may yield an incorrect
ToUnicode CMap in the generated PDF. The issue seems to be limited
to characters like math symbols (in the same font as the problematic
character?), though; letters, including accented ones, do not seem
to be affected.

Here's a shell script used for some testing:

────────────────────────────────────────────────────────────────────────
#!/bin/sh

set -e

out()
{
  echo -n "$i$j ($1):"
  printf " %s" $(pdftotext chartest9$i$j$2.pdf - | tr -d '\f')
  echo
}

for i in a b
do
  for j in 0 1
  do
    cat <<'EOF' | sed "s/:$i/\\\\lceil/" | \
                  sed "s/:a//" | \
                  sed "s/J/$j/" > chartest9.tex
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\pdfgentounicode=J
\begin{document}
\thispagestyle{empty}
$\in:a$
\end{document}
EOF

    pdflatex chartest9.tex > /dev/null
    mv chartest9.pdf chartest9$i$j.pdf
    out "pdfTeX" ""

    ps2pdf14 chartest9$i$j.pdf chartest9$i$j-new.pdf
    out "gs new" "-new"

    ps2pdf14 -dNEWPDF=false chartest9$i$j.pdf chartest9$i$j-old.pdf
    out "gs old" "-old"
  done
done
────────────────────────────────────────────────────────────────────────

See the upstream bug for the obtained PDF files.

4 kinds of PDF inputs are tested (a0, a1, b0, b1), where
  * a: the content corresponds to "∈⌈" (ELEMENT OF + LEFT CEILING)
  * b: the content corresponds to "∈" (ELEMENT OF)
  * 0: \pdfgentounicode=0 (pdfTeX does not generate a ToUnicode CMap)
  * 1: \pdfgentounicode=1 (pdfTeX generates a ToUnicode CMap)

I've compared (see above script for details):
  * pdfTeX: PDF file generated by pdfTeX from TeX Live 2022
  * gs new: PDF file obtained with the new PDF interpreter (default)
  * gs old: PDF file obtained with the old PDF interpreter (dNEWPDF=false)

I've done the tests with the ghostscript 9.56.1~dfsg-1 Debian package.

If LEFT CEILING is not present, Ghostscript does not generate
a ToUnicode CMap in all of these cases, which is fine. But if
this character is present:

1. With the old PDF interpreter, Ghostscript generates a correct
ToUnicode CMap.

2. With the new PDF interpreter and no input ToUnicode CMap,
Ghostscript does not generate a ToUnicode CMap (the only practical
issue is that one cannot get unual characters like LEFT CEILING, but
this is not worse than what TeX Live 2022 can yield in any case).

3. With the new PDF interpreter and an input ToUnicode CMap like
the one from TeX Live 2022, Ghostscript generates an incorrect
ToUnicode CMap, which prevents one from getting usual math
characters such as ELEMENT OF.

The results, where I've added ToUnicode CMap information (which I have
obtained with "qpdf --stream-data=uncompress" on these PDF files):

a0 (pdfTeX): ∈d (no CMap)
a0 (gs new): ∈d (no CMap)
a0 (gs old): ∈⌈ (CMap old)
a1 (pdfTeX): ∈d (CMap 1)
a1 (gs new):    (CMap 1-new)
a1 (gs old): ∈⌈ (CMap old)
b0 (pdfTeX): ∈  (no CMap)
b0 (gs new): ∈  (no CMap)
b0 (gs old): ∈  (no CMap)
b1 (pdfTeX): ∈  (CMap 1)
b1 (gs new): ∈  (no CMap)
b1 (gs old): ∈  (no CMap)

with the following ToUnicode CMaps:

CMap old:
────────────────────────────────────────
begincmap
/CMapType 2 def
/CMapName/R11 def
1 begincodespacerange
<00><ff>
endcodespacerange
2 beginbfrange
<32><32><2208>
<64><64><2308>
endbfrange
endcmap
────────────────────────────────────────

CMap 1:
────────────────────────────────────────
begincmap
/CIDSystemInfo
<< /Registry (TeX)
/Ordering (lmsy10-lm-mathsy)
/Supplement 0
>> def
/CMapName /TeX-lmsy10-lm-mathsy-0 def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
0 beginbfrange
endbfrange
0 beginbfchar
endbfchar
endcmap
────────────────────────────────────────

CMap 1-new:
────────────────────────────────────────
begincmap
/CMapType 2 def
/CMapName/R11 def
1 begincodespacerange
<00><ff>
endcodespacerange
2 beginbfrange
<32><32><00>
<64><64><00>
endbfrange
endcmap
────────────────────────────────────────

-- System Information:
Debian Release: bookworm/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.17.0-1-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages ghostscript depends on:
ii  libc6   2.33-7
ii  libgs9  9.56.1~dfsg-1

ghostscript recommends no packages.

Versions of packages ghostscript suggests:
ii  ghostscript-x  9.56.1~dfsg-1

-- no debconf information

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

--- End Message ---
--- Begin Message ---
On Fri, 22 Apr 2022 00:38:38 +0200 Vincent Lefevre <vincent@vinc17.net> wrote:
> Package: ghostscript
> Version: 9.56.1~dfsg-1
> Severity: normal
> Tags: upstream
> Forwarded: https://bugs.ghostscript.com/show_bug.cgi?id=705246

The upstream bug report includes a note from Vincent that it is fixed in 
version 10.

Attachment: signature.asc
Description: This is a digitally signed message part.


--- End Message ---

Reply to: