[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A Bit of a Strange Situation



RiverWind wrote:
> The idea was to concat a large html file and then convert it to
> text. The pdf can be converted to text, and it so far seems like a
> pretty viable translation.

If I were going to do that for myself I would convert each individual
html file to text first and then concatenate the individual text
files.  The reason being that the individual html files are at that
moment completely consistent.  Individually they should be able to
convert to text cleanly with no problems.  And then the text can be
concatenated.  But once you concatenate the html then you have created
a Frankenstein html file that is almost certainly going to be
problematic to convert to text.

Also, my naive experience with this is that converting html to text is
a lot easier than converting pdf to text.  With html it is already a
text type.  The mime type is "text/html" after all.  But pdf has been
less accessible for conversions for me.  The mime time is
"application/pdf" and isn't a text type.  That introduces more room
for error to be introduced.

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: