Henry Chang, le jeu. 27 oct. 2022 12:08:20 -0400, a ecrit: > I found that the original 11470644.pdf is formatted in two columns. The texts > on a line of the first column messed up with the texts on the line of the > second column at the same position. Perhaps you can use pdfcrop and pdftk to split pages into the left and the right parts, and join then together again in a single pdf file that you can feed to tesseract. Samuel