[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: howto extract data from...



Thanks,
That worked well...
On Fri, 2003-06-13 at 17:32, Ben Kal wrote:
> On 11 Jun 2003 Grzesiek Sedek <grzesiek@mi2.hr> wrote:
> 
> > Anyone have an idea how to extract clear text from inbox file (actual
> > file is from m$ entuage on mac called Messages) it got corrupded and
> > mail client does not read it. its quite big 500 Mb so I have to do it at
> > least semi automaticly. main problem are the attachments(I dont need
> > them)- they quite big, rest of content is text.
> 
> You do not describe what the contents of the file look like, so I must
> guess at what distinguishes attachments from message texts.
> 
> My guess then is that the 500 Mb file is essentially a text file, and that
> the attachments you want to get rid of are big  solid blocks of characters:
> long sequences of lines, all of the same length, without any spaces in them.
> If that is true, a simple sed command will suffice:
> 
> sed -e '/^[^ ][^ ]*$/d' Messages > Messages_attachments_stripped
> 
> This says: delete all lines that are not empty and do not contain spaces.
> Be careful. You may want to refine the regular expression that selects
> the lines to be deleted. As it stands, a line like
> -------------------------------
> that someone may have used in a message text to make a line stand out
> as a header, will also be deleted, as well as lines delimiting parts
> of messages, like
> --346095821--1674543256--1308352331
> 
> Ben
-- 
Grzesiek Sedek <grzesiek@mi2.hr>



Reply to: