Re: Restating question "How to manipulate PDF documents in Debian?"

To: debian-user@lists.debian.org
Subject: Re: Restating question "How to manipulate PDF documents in Debian?"
From: David Wright <deblis@lionunicorn.co.uk>
Date: Tue, 22 Jul 2025 11:19:12 -0500
Message-id: <aH+6ALnr/67N7/td@axis.corp>
Reply-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 386e1601-045f-49f8-7c56-ace4b7a25ca1@access.net>
References: <[🔎] 8d2c5246-742d-fd57-20ac-86e7f1d7f191@access.net> <[🔎] 386e1601-045f-49f8-7c56-ace4b7a25ca1@access.net>

On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote:
> On 7/20/25 5:52 AM, Richard Owlett wrote:
> > I'm running Debian 12.8.
> > 
> > I have a 100+ page PDF document.
> > I wish to extract 2 of those pages, each to their own PDF file.

[ … ]

> I should have put more "em-FAY-sis" on my goal for this thread being
> learning how to extract specific pages of a large PDF document.[1] I
> had not fully appreciated how graphically oriented the PDF format is.
> 
> The sub-goal being to perceive the the byte level structure of *that*
> page in order to extract the semantic content perceived by a human. I
> would then edit/reformat the content to be *useful* to a different
> target audience.

It's very simple to burst a document into individual pages with pdftk:

  $ pdftk document.pdf burst
  $ 

The pages, named pg_0001.pdf, pg_0002.pdf, etc. will be in the
working directory, and it may create a file doc_data.txt
containing some metadata, which you can ignore.

Be warned that it will overwrite files with these names if previously
existing, so do it in the right place. (I use a script that bursts
into a temporary directory and then uses   mv -i   to move them
with more control.)

Cheers,
David.

Reply to:

Follow-Ups:
- Re: Restating question "How to manipulate PDF documents in Debian?"
  - From: Richard Owlett <rowlett@access.net>

References:
- How to manipulate PDF documents in Debian?
  - From: Richard Owlett <rowlett@access.net>
- Restating question "How to manipulate PDF documents in Debian?"
  - From: Richard Owlett <rowlett@access.net>

Prev by Date: Re: Restating question "How to manipulate PDF documents in Debian?"
Next by Date: Re: fwupdmgr refresh → "Host unreachable" - polkkitd issue? (Debian 13)
Previous by thread: Re: Restating question "How to manipulate PDF documents in Debian?"
Next by thread: Re: Restating question "How to manipulate PDF documents in Debian?"
Index(es):
- Date
- Thread