[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Getting Tamil Text out of PDF



Quoting Shrinivasan T (2017-01-30 17:58:16)
> I am trying to get the tamil text from the PDF files generated by
> libreoffice.
> But the glyphs are not correct.
> 
> The same works for Enlglish well.
> 
> This issue is there for many years.
> Is there any improvement or new tools on this?
> 
> One solution is to do OCR with google drive.

This was discussed here recently: 
https://lists.debian.org/debian-dug-in/2017/01/msg00012.html

As I also wrote in that previous discussion, please share a concrete 
example (small, preferably) PDF and the UTF-8 text string supposed to 
come out of it, so that also non-tamil geeks like me can help test.

 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

Attachment: signature.asc
Description: signature


Reply to: