[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pdf editor ?



On Mon, Mar 19, 2001 at 04:26:38PM +0100, Sven LUTHER wrote:
> Hello, ...
> 
> Seeing that most everything comes in pdf format these days, and that at least
> xpdf, acroread (well it is i386 only and non-free, but still usefull) and gv
> can read and display/print this format, i asked myself if would be possible to
> edit document in pdf format ? 

as others have mentioned, PDF is a postscript hybrid. it'll
faithfully recreate the original document on-screen or on-printer
since it's got all the
	moveto this spot
	set this color
	draw a line to here and there
	pick a font and size
	moveto over here
	drawtext("Hello World")
instructions therein.

the trouble is, using programs like quark xpress, graphic artists
can 'kern' their text for a more (or less) pleasing effect; when
they do so, often the resulting postscript code breaks the text
at the kern point. as an example the word "Type" is often kerned
to tuck the "y" under the "T". that might generate pseudocode
something like this:

	moveto x,y
	drawtext 'T'
	moveto h,v
	drawtext 'ype'

in the original document, the text stream knows the characters in
the string 'Type' are consecutive and make up a whole word, and
it also knows about the kerning information to get 'T' to snuggle
up to 'y'.

in the postscript (a.k.a. pdf) there's only a

	put 'T' over here
	put 'ype' over there

which could conceivably even be in reversed order -- so long as
the human reading the resulting display can grab the lexical
intent of the word, all is well. (ordering, in postscript, only
matters when items overlap -- and if they have no stroke and
identical fill color, not even then.)

hard to re-munge that sort of randomly-broken alphabetics back
into editable text.

but not impossible! i can imagine some bright soul coming up with
a sort algorithm based on locale (e.g. roman = left-to-right then
top-to-bottom, vs. arabic, vs. mandarin) to re-assemble text
fragments into likely "original text stream" format.
(superscripts and subscripts throw a monkeywrench at that
concept, of course.) asking it to also keep track of font sizing
and style and so forth, would be an immense task. yet it's
conceivably doable.

but i don't know of any such, at the moment. (doesn't mean there
aren't any, i just don't know of any. could be a big difference,
there.)

			-- any takers? :)

-- 
It is always hazardous to ask "Why?" in science, but it is often
interesting to do so just the same.
		-- Isaac Asimov, 'The Genetic Code'

will@serensoft.com
http://newbieDoc.sourceforge.net/ -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!



Reply to: