Re: howto extract data from...
Thanks,
That worked well...
On Fri, 2003-06-13 at 17:32, Ben Kal wrote:
> On 11 Jun 2003 Grzesiek Sedek <grzesiek@mi2.hr> wrote:
>
> > Anyone have an idea how to extract clear text from inbox file (actual
> > file is from m$ entuage on mac called Messages) it got corrupded and
> > mail client does not read it. its quite big 500 Mb so I have to do it at
> > least semi automaticly. main problem are the attachments(I dont need
> > them)- they quite big, rest of content is text.
>
> You do not describe what the contents of the file look like, so I must
> guess at what distinguishes attachments from message texts.
>
> My guess then is that the 500 Mb file is essentially a text file, and that
> the attachments you want to get rid of are big solid blocks of characters:
> long sequences of lines, all of the same length, without any spaces in them.
> If that is true, a simple sed command will suffice:
>
> sed -e '/^[^ ][^ ]*$/d' Messages > Messages_attachments_stripped
>
> This says: delete all lines that are not empty and do not contain spaces.
> Be careful. You may want to refine the regular expression that selects
> the lines to be deleted. As it stands, a line like
> -------------------------------
> that someone may have used in a message text to make a line stand out
> as a header, will also be deleted, as well as lines delimiting parts
> of messages, like
> --346095821--1674543256--1308352331
>
> Ben
--
Grzesiek Sedek <grzesiek@mi2.hr>
Reply to: