[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to extract text from PDF?



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tzafrir Cohen wrote:
> On Thu, Mar 06, 2008 at 02:55:13PM +0100, Johannes Wiedersich wrote:
>> Andrius wrote on 2008-03-06 00:20:
>>> technical question: is it possible to extract text from PDF? From PDF to
>>> txt.
>> There have already been some useful suggestions.
>>
>> One more: on kpdf, the right mouse button selects a rectangle whose
>> contents can be copied to the clipboard. Useful for pasting small parts
>> of text to other applications. For full pages/documents try pdftotext.
>
> Likewise xpdf and evince. Or the command-line pdftotext . They all use
> the same basic PDF library (xpdf / popller), so they'll probably handle
> PDFs rather similarly.
>
This is true when the pdf is mixed of text with pictures but not when
the pdf is only pictures -
For example if you used gscan2pdf without gocr.

In this case you need to run gocr on each page .


- --
- --
Could you at least use man ?
    Jabka Atu (aka mha13/Mashrom Head) || bsh83.blogspot.com
- --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQIVAwUBR9KIhJzNKvbZ7QAsAQK+Ig/7B1KTdAxk36s0tuthInIQ9FWnpi+uPUpc
i4KPJSV88119O4dpiBNQkn2UfLgDn9S+4dzXydAl6anU0yoeCBTWH3p0je7CxceK
xEkKz8dK6wPgt7/g2EUT8gV8uhTh5GXSJYSCKp6otvlUmEgmlmZV9ekyMyOh588z
1EBGdSBdGulu0jImeS5Lf+s+LC5AirOLIvvJPNoA2GpHf96R2MaIbrypwNnlCmbU
0MSbOKvEuQy1yYgDv+et21+9qVNX96IIYazZrZ9fwGkEaKWCg0OidHyCC4/UV43d
rsEwiRxOiNNEwU3QxDiklPw9YTOtn+4wC8IL47MePl84PtboSLt4kdRrINsG1I3F
8xu4313BBM29Z7+ErQ5bfttMtoqhA24LYgZRpISZAGL7xobysYagEhURQlqGK+NP
2j6pIMuK1LahY9PT0bFx6GC3wR82yFwfrDc57RDTvC5tD59ddp6LVGPLtnhReKNF
5Khc+bSP3XlSGoVdoY8I34drRTj9MBGIKKKC6XmrBOri1MnUEsphjAiEdXsYlfHo
IiKMdkwocmhCWd2bhlt5IhyQMY42Rcl3jO8MlMsqAbfbt1ABzyCl5gKezjcPieU8
AVwAo7uUMjD03uPgR2W1xYbz8INospjnqiCeclBC1NolMZ0sgCVxdet5JHWY3f18
50HGdJ0zvKU=
=TLaW
-----END PGP SIGNATURE-----


Reply to: