[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#894068: marked as done (ocrmypdf: New dependency on PyMuPDF for v6.0.0)



Your message dated Fri, 18 May 2018 15:41:21 -0700
with message-id <CAGOGP4TO1u9CmUAya9SO4ddGA84qrm+MxDA+26Akd4Z6_6s8Aw@mail.gmail.com>
and subject line Re: Bug#894068: ocrmypdf: New dependency on PyMuPDF for v6.0.0
has caused the Debian Bug report #894068,
regarding ocrmypdf: New dependency on PyMuPDF for v6.0.0
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
894068: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=894068
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: ocrmypdf
Version: v6.0.0
Severity: serious
Tags: newcomer
Justification: fails to build from source (but built successfully in the past)

Dear Sean,

In v6.0.0, which addresses and hopefully fixes #888917, I have introduced a
new dependency on PyMuPDF (Python bindings for MuPDF).  Unfortunately PyMuPDF
isn't available in Debian as yet (I have checked there is no python3-pymupdf).

The build procedure should go like this:

  - download/unpack MuPDF to mupdf/
  - download/unpar PyMuPDF to pymupdf/
  - cp pymupdf/fitz/_mupdf_config.h mupdf/include/mupdf/fitz/config.h
  - export CFLAGS=-fPIC 
  - make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no
  - patch pymupdf/setup.py to point library_dirs and include_dirs to the
    output of mupdf/ build

The reason for this circumlocution is that the vendor of MuPDF, Artifex, 
does not provide or support dynamic libraries or a stable ABI, and 
compiling the Python bindings requires a dynamic library.  Perhaps as a way
to warn people about their stance, they don't enable -fPIC by default and
link their application statically.

This means that unfortunately, one cannot link to libmupdf-dev (and 
actually, I'm not sure if libmupdf-dev serves any purpose at all, unless 
it were rebuilt with -fPIC).  Certainly if the maintainers of this 
package could be persuaded to build it with -fPIC that would make this 
much easier.

I did try to build with it with Debian sid against the libmupdf-dev 
library. The error, as with Ubuntu, is:
  relocation R_X86_64_PC32 against symbol `fz_empty_irect' can not be 
used when making a shared object; recompile with -fPIC

The make options and replacement of the header file in mupdf are all 
disabling features unnecessary for PyMuPDF's purposes. It shrinks the 
binary from 30 MB to 3 MB.

The PyMuPDF developers describe their build process here:
https://github.com/rk700/PyMuPDF/wiki/Ubuntu-Installation-Experience

I'm happy to help with the packaging of this dependency, and I got it the
process working for Python binary wheels already.  However, I don't really
know much about Debian processes and policy.

Regards,
James

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.4.119-boot2docker (SMP w/1 CPU core)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968), LANGUAGE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages ocrmypdf depends on:
pn  ghostscript                   <none>
pn  icc-profiles-free             <none>
pn  liblept5                      <none>
ii  python3                       3.6.5~rc1-1
pn  python3-cffi-backend-api-max  <none>
pn  python3-cffi-backend-api-min  <none>
pn  python3-img2pdf               <none>
pn  python3-pil                   <none>
ii  python3-pkg-resources         39.0.1-1
pn  python3-pypdf2                <none>
pn  python3-reportlab             <none>
pn  python3-ruffus                <none>
pn  qpdf                          <none>
pn  tesseract-ocr                 <none>
ii  zlib1g                        1:1.2.8.dfsg-5

Versions of packages ocrmypdf recommends:
pn  unpaper  <none>

Versions of packages ocrmypdf suggests:
pn  img2pdf          <none>
pn  ocrmypdf-doc     <none>
pn  python-watchdog  <none>

--- End Message ---
--- Begin Message ---
Hi Sean,

I ended up deciding to remove PyMuPDF (apart from optional tests in the test suite, anyway) from the next major release of ocrmypdf - I'll still need your support with some new dependencies, but I think I've found a solution that should more acceptable to Debian and will work better for me as well.

-James


On Sat, 31 Mar 2018 at 08:45 Sean Whitton <spwhitton@spwhitton.name> wrote:
Hello,

On Sat, Mar 31 2018, James R Barlow wrote:

> Hello Sean,
>
> As promised ocrmypdf v6.1.2 makes pymupdf optional but recommended. My
> continuous integration tests check with and without pymupdf.
>
> The only major regression without pymupdf is that with all of:
> 1) an input file containing a mix of scanned and born digital files
> 2) --skip-text (not default)
> 3) --output-type pdf (not default)
> the output file can grow extremely large compared to the input. Past
> versions of ocrmypdf have had this issue for a long time, and now it will
> produce a warning.
>
> So it should be ready for Debian.

I will start working on the new packaging.  Thank you for the new
release.

--
Sean Whitton

--- End Message ---

Reply to: