Re: How to extract TABULAR data from a PDF document?

To: jeremy ardley <jeremy.ardley@gmail.com>
Cc: debian-user@lists.debian.org
Subject: Re: How to extract TABULAR data from a PDF document?
From: <tomas@tuxteam.de>
Date: Thu, 24 Apr 2025 08:48:43 +0200
Message-id: <[🔎] aAney335kR3pOgLj@tuxteam.de>
In-reply-to: <[🔎] 38eb8d57-b243-42ad-aa53-97e9afb290ac@gmail.com>
References: <[🔎] c0bc5b53-ae27-0ddc-1261-a893474783d7@access.net> <[🔎] 14701c9a-79ef-4dec-be96-8a7db6d4d795@gmail.com> <[🔎] aAcCVrxLrpY5gKND@axis.corp> <[🔎] 6546606a-a237-4808-8cd5-ee95a7e4f7c8@gmail.com> <[🔎] vu9jq7$grp$1@ciao.gmane.io> <[🔎] 0ea949c1-c191-4f47-9d6e-85434e5eedd6@gmail.com> <[🔎] vuc7qh$779$1@ciao.gmane.io> <[🔎] 38eb8d57-b243-42ad-aa53-97e9afb290ac@gmail.com>

On Thu, Apr 24, 2025 at 11:32:23AM +0800, jeremy ardley wrote:
> 
> On 24/4/25 10:31, Max Nikulin wrote:
> > 
> > By the way, PDF files may be tagged for screen readers. Is there a
> > dedicated structure to explicitly mark tables? It would be the best
> > source for data extraction.
> 
> 
> ISO 14289 is an accessibility standard for PDF. It allows for the creation
> of a "Tagged PDF" where semantic information, including table structures
> (<Table>, <TR>, <TH>, <TD>), can be embedded in a separate logical structure
> tree
> 
> You can download it for free at https://pdfa.org/resource/iso-14289-pdfua/

Oh, thanks for this one :)

Cheers
-- 
tomás

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: How to extract TABULAR data from a PDF document?
  - From: "Andrew M.A. Cater" <amacater@einval.com>

References:
- How to extract TABULAR data from a PDF document?
  - From: Richard Owlett <rowlett@access.net>
- Re: How to extract TABULAR data from a PDF document?
  - From: jeremy ardley <jeremy.ardley@gmail.com>
- Re: How to extract TABULAR data from a PDF document?
  - From: David Wright <deblis@lionunicorn.co.uk>
- Re: How to extract TABULAR data from a PDF document?
  - From: jeremy ardley <jeremy.ardley@gmail.com>
- Re: How to extract TABULAR data from a PDF document?
  - From: Max Nikulin <manikulin@gmail.com>
- Re: How to extract TABULAR data from a PDF document?
  - From: jeremy ardley <jeremy.ardley@gmail.com>
- Re: How to extract TABULAR data from a PDF document?
  - From: Max Nikulin <manikulin@gmail.com>
- Re: How to extract TABULAR data from a PDF document?
  - From: jeremy ardley <jeremy.ardley@gmail.com>

Prev by Date: Re: Gnome to XFCE
Next by Date: Re: PC recommendations for Debian 12
Previous by thread: Re: How to extract TABULAR data from a PDF document?
Next by thread: Re: How to extract TABULAR data from a PDF document?
Index(es):
- Date
- Thread