[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A Bit of a Strange Situation



It should be possible to use the contents of index.html to set the order 
for concatenation too.On Thu, 25 Aug 2011, Bob Proulx wrote:

> RiverWind wrote:
> > The idea was to concat a large html file and then convert it to
> > text. The pdf can be converted to text, and it so far seems like a
> > pretty viable translation.
> 
> If I were going to do that for myself I would convert each individual
> html file to text first and then concatenate the individual text
> files.  The reason being that the individual html files are at that
> moment completely consistent.  Individually they should be able to
> convert to text cleanly with no problems.  And then the text can be
> concatenated.  But once you concatenate the html then you have created
> a Frankenstein html file that is almost certainly going to be
> problematic to convert to text.
> 
> Also, my naive experience with this is that converting html to text is
> a lot easier than converting pdf to text.  With html it is already a
> text type.  The mime type is "text/html" after all.  But pdf has been
> less accessible for conversions for me.  The mime time is
> "application/pdf" and isn't a text type.  That introduces more room
> for error to be introduced.
> 
> Bob
> 

Jude <jdashiel@shellworld.net>
"I love the Pope, I love seeing him in his Pope-Mobile, his three feet
of bullet proof plexi-glass. That's faith in action folks! You know he's
got God on his side."
~ Bill Hicks


Reply to: