[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to extract TABULAR data from a PDF document?



On Thu, Apr 24, 2025 at 11:32:23AM +0800, jeremy ardley wrote:
> 
> On 24/4/25 10:31, Max Nikulin wrote:
> > 
> > By the way, PDF files may be tagged for screen readers. Is there a
> > dedicated structure to explicitly mark tables? It would be the best
> > source for data extraction.
> 
> 
> ISO 14289 is an accessibility standard for PDF. It allows for the creation
> of a "Tagged PDF" where semantic information, including table structures
> (<Table>, <TR>, <TH>, <TD>), can be embedded in a separate logical structure
> tree
> 
> You can download it for free at https://pdfa.org/resource/iso-14289-pdfua/

Oh, thanks for this one :)

Cheers
-- 
tomás

Attachment: signature.asc
Description: PGP signature


Reply to: