On 2/20/25 11:20 AM, debian-user@howorth.org.uk wrote:
Richard Owlett <rowlett@access.net> wrote:I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 Table ES-1. Thrifty Food Plan market baskets, quantities of food purchased for a week, by age-gender group, 2006If you look at https://www.fns.usda.gov/cnpp/thrifty-food-plan-2021 instead, you can find the underlying data in spreadsheet form (.xlsx). Perhaps that will be an adequate substitute?
You just demonstrated that "Murphy's Law" holds ;<I click on the link you quoted in my default browser and a PDF is displayed [actually my original starting point months ago].
If I use my alternate browser {Firefox instead of SeaMonkey} I get to chose which of several files to view. {one of them is an .xlsx file}
Murphy gets a second jab in.The 2006 version has the data I want in a slightly different layout that the 2021 version. The first is a better match for how I do things ;/
Also the PDF structure of the two links react slightly differently when selecting with mouse movements/clicks. The 2006 version seems to allow me to select only what I want. [ 2021 version grabs everything between first and last click. 2006 appears to select only the columns of interest]
Can't spend time right now to verify first impression. Will know more this weekend.
*THANK YOU*