Quoting Shrinivasan T (2017-01-30 17:58:16) > I am trying to get the tamil text from the PDF files generated by > libreoffice. > But the glyphs are not correct. > > The same works for Enlglish well. > > This issue is there for many years. > Is there any improvement or new tools on this? > > One solution is to do OCR with google drive. This was discussed here recently: https://lists.debian.org/debian-dug-in/2017/01/msg00012.html As I also wrote in that previous discussion, please share a concrete example (small, preferably) PDF and the UTF-8 text string supposed to come out of it, so that also non-tamil geeks like me can help test. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
Attachment:
signature.asc
Description: signature