Bug#986856: RFP: dangerzone -- Take potentially dangerous PDFs, office documents, or images and convert them to a safe PDF

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#986856: RFP: dangerzone -- Take potentially dangerous PDFs, office documents, or images and convert them to a safe PDF
From: Antoine Beaupre <anarcat@debian.org>
Date: Mon, 12 Apr 2021 16:33:33 -0400
Message-id: <[🔎] 161825961306.28991.15636880633064653504.reportbug@curie.anarc.at>
Reply-to: Antoine Beaupre <anarcat@debian.org>, 986856@bugs.debian.org

Package: wnpp
Severity: wishlist

* Package name    : dangerzone
  Version         : 0.1.5
  Upstream Author : Micah Lee
* URL             : https://dangerzone.rocks/
* License         : MIT/X or expat?
  Programming Lang: Python
  Description     : Take potentially dangerous PDFs, office documents, or images and convert them to a safe PDF

Dangerzone works like this: You give it a document that you don't know
if you can trust (for example, an email attachment). Inside of a
sandbox, Dangerzone converts the document to a PDF (if it isn't
already one), and then converts the PDF into raw pixel data: a huge
list of of RGB color values for each page. Then, in a separate
sandbox, Dangerzone takes this pixel data and converts it back into a
PDF.

 * Sandboxes don't have network access, so if a malicious document can compromise one, it can't phone home
 * Dangerzone can optionally OCR the safe PDFs it creates, so it will have a text layer again
 * Dangerzone compresses the safe PDF to reduce file size
 * After converting, Dangerzone lets you open the safe PDF in the PDF viewer of your choice, which allows you to open PDFs and office docs in Dangerzone by default so you never accidentally open a dangerous document

Dangerzone can convert these types of document into safe PDFs:

 * PDF (.pdf)
 * Microsoft Word (.docx, .doc)
 * Microsoft Excel (.xlsx, .xls)
 * Microsoft PowerPoint (.pptx, .ppt)
 * ODF Text (.odt)
 * ODF Spreadsheet (.ods)
 * ODF Presentation (.odp)
 * ODF Graphics (.odg)
 * Jpeg (.jpg, .jpeg)
 * GIF (.gif)
 * PNG (.png)
 * TIFF (.tif, .tiff)

Dangerzone was inspired by Qubes trusted PDF, but it works in
non-Qubes operating systems. It uses containers as sandboxes instead
of virtual machines (using Docker for macOS, Windows, and
Debian/Ubuntu, and podman for Fedora).

===

The build instructions seem to say there aren't that many deps, and
possibly that all of them are in Debian already:

https://github.com/firstlookmedia/dangerzone/blob/master/BUILD.md

It seems like it also requires Docker, which we now have in Debian,
but that might be a problem for shipping with stable in the long term.

A better approach for upstream might be to use podman, or manage
containers itself, with bubblewrap or firejail or equivalent.

Those tools, obviously, are alternatives that might be a good fit as
well, but since they are much more bare bones, i think it would still
be nice to have dangerzone in Debian.

I'd be happy to comaintain this in Debian with the Python team.

Reply to:

Prev by Date: Bug#805704: Bay Area Wedding Fairs Visitors Info List Details
Next by Date: Bug#986857: ITP: librdfa-java -- SAX-based Java RDFa parser
Previous by thread: Bug#805704: Bay Area Wedding Fairs Visitors Info List Details
Next by thread: Bug#986857: ITP: librdfa-java -- SAX-based Java RDFa parser
Index(es):
- Date
- Thread