Re: Restating question "How to manipulate PDF documents in Debian?"
On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote:
> On 7/20/25 5:52 AM, Richard Owlett wrote:
> > I'm running Debian 12.8.
> >
> > I have a 100+ page PDF document.
> > I wish to extract 2 of those pages, each to their own PDF file.
[ … ]
> I should have put more "em-FAY-sis" on my goal for this thread being
> learning how to extract specific pages of a large PDF document.[1] I
> had not fully appreciated how graphically oriented the PDF format is.
>
> The sub-goal being to perceive the the byte level structure of *that*
> page in order to extract the semantic content perceived by a human. I
> would then edit/reformat the content to be *useful* to a different
> target audience.
It's very simple to burst a document into individual pages with pdftk:
$ pdftk document.pdf burst
$
The pages, named pg_0001.pdf, pg_0002.pdf, etc. will be in the
working directory, and it may create a file doc_data.txt
containing some metadata, which you can ignore.
Be warned that it will overwrite files with these names if previously
existing, so do it in the right place. (I use a script that bursts
into a temporary directory and then uses mv -i to move them
with more control.)
Cheers,
David.
Reply to: