Re: How to convert a XML file of US patent into a plain text file on a Linux platform?
Hello,
Henry Chang, le dim. 23 oct. 2022 20:12:45 -0400, a ecrit:
> I have successfully convert a pdf file of US patent into .png, then into .txt
> by using pdftoppm and tesseract.
pdftoppm could re-rater. Better use pdfimages which will just take the
images from the pdf unmodified.
> I found that USPTO provides plain text files in .xmal file.
>
> From the USPTO webiste, we downloaded a XML full-text data, ipg221011.xml. This
> file contains lots of XML files of U.S. patent data. How can I convert this
> .xml file into plain text files of US patents?
that xml file doesn't seem to be actually containing the patent text.
Samuel
Reply to: