[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to search and mine text on a two-column pdf file?



Henry Chang, le jeu. 27 oct. 2022 12:08:20 -0400, a ecrit:
> I found that the original 11470644.pdf is formatted in two columns. The texts
> on a line of the first column messed up with the texts on the line of the
> second column at the same position.

Perhaps you can use pdfcrop and pdftk to split pages into the left and
the right parts, and join then together again in a single pdf file that
you can feed to tesseract.

Samuel


Reply to: