Re: Alternative to Debian Repository - extract CSV formatted data from PDF

To: debian-user@lists.debian.org
Subject: Re: Alternative to Debian Repository - extract CSV formatted data from PDF
From: Richard Owlett <rowlett@access.net>
Date: Thu, 20 Feb 2025 13:52:06 -0600
Message-id: <[🔎] 7af94c88-ec91-7bf3-0511-bd4e7052a530@access.net>
In-reply-to: <[🔎] 20250220172000.45326679@acer-suse.fritz.box>
References: <[🔎] bdadb53e-59c1-6bc1-94c9-551d1a5e26b1@access.net> <[🔎] 20250220172000.45326679@acer-suse.fritz.box>

On 2/20/25 11:20 AM, debian-user@howorth.org.uk wrote:

Richard Owlett <rowlett@access.net> wrote:

I wish to extract CSV formatted data from a PDF document. [1]
Page ES-7 has a weekly grocery list for males grouped by age.
I need only the first and last columns.

Can someone point me in a suitable direction?

TIA

[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
      Table ES-1. Thrifty Food Plan market baskets, quantities of food
       purchased for a week, by age-gender group, 2006


If you look at
https://www.fns.usda.gov/cnpp/thrifty-food-plan-2021 instead, you can
find the underlying data in spreadsheet form (.xlsx). Perhaps that will
be an adequate substitute?


You just demonstrated that "Murphy's Law" holds ;<

I click on the link you quoted in my default browser and a PDF isdisplayed [actually my original starting point months ago].

If I use my alternate browser {Firefox instead of SeaMonkey} I get tochose which of several files to view. {one of them is an .xlsx file}


Murphy gets a second jab in.

The 2006 version has the data I want in a slightly different layout thatthe 2021 version. The first is a better match for how I do things ;/

Also the PDF structure of the two links react slightly differently whenselecting with mouse movements/clicks. The 2006 version seems to allowme to select only what I want. [ 2021 version grabs everything betweenfirst and last click. 2006 appears to select only the columns of interest]

Can't spend time right now to verify first impression. Will know morethis weekend.


*THANK YOU*

Reply to:

Follow-Ups:
- Re: Alternative to Debian Repository - extract CSV formatted data from PDF
  - From: David Wright <deblis@lionunicorn.co.uk>

References:
- Alternative to Debian Repository - extract CSV formatted data from PDF
  - From: Richard Owlett <rowlett@access.net>
- Re: Alternative to Debian Repository - extract CSV formatted data from PDF
  - From: debian-user@howorth.org.uk

Prev by Date: Re: fstrim for LUKS2 encyrypted LVM
Next by Date: Re: fstrim for LUKS2 encyrypted LVM
Previous by thread: Re: Alternative to Debian Repository - extract CSV formatted data from PDF
Next by thread: Re: Alternative to Debian Repository - extract CSV formatted data from PDF
Index(es):
- Date
- Thread