Re: A Bit of a Strange Situation
It should be possible to use the contents of index.html to set the order
for concatenation too.On Thu, 25 Aug 2011, Bob Proulx wrote:
> RiverWind wrote:
> > The idea was to concat a large html file and then convert it to
> > text. The pdf can be converted to text, and it so far seems like a
> > pretty viable translation.
>
> If I were going to do that for myself I would convert each individual
> html file to text first and then concatenate the individual text
> files. The reason being that the individual html files are at that
> moment completely consistent. Individually they should be able to
> convert to text cleanly with no problems. And then the text can be
> concatenated. But once you concatenate the html then you have created
> a Frankenstein html file that is almost certainly going to be
> problematic to convert to text.
>
> Also, my naive experience with this is that converting html to text is
> a lot easier than converting pdf to text. With html it is already a
> text type. The mime type is "text/html" after all. But pdf has been
> less accessible for conversions for me. The mime time is
> "application/pdf" and isn't a text type. That introduces more room
> for error to be introduced.
>
> Bob
>
Jude <jdashiel@shellworld.net>
"I love the Pope, I love seeing him in his Pope-Mobile, his three feet
of bullet proof plexi-glass. That's faith in action folks! You know he's
got God on his side."
~ Bill Hicks
Reply to: