[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT - Tool for getting text body of email



on Sun, Dec 23, 2001 at 10:49:46AM -0900, Christopher S. Swingley (cswingle@iarc.uaf.edu) wrote:
> I need to write a program the extracts the ASCII text portion
> of email messages for insertion into a database.  I looked at the
> libmailtools-perl package, but it doesn't look like it can deal with
> the annoying variety of mail that I may need to parse (The silly +'s
> at the end of lines, MIME-attached HTML, vcards, etc.).
> 
> What I want is a filter that I pass an email in, and out pops the
> ASCII, 72-line width formatted message.  All attachments, HTML mail,
> vcards and strangeness is removed.

I'm looking for something vagely similar.

I think what I'm looking for is a tool that will strictly decode
printed-quotable mail, base64-encoded mail, and other representations
that don't resolve as plaintext.  I _don't_ need to resolve HTML or
other tagging formats.

The objective is to get the mail body into a form that can be scanned
for website references.  I use this as part of my spam response system,
with a script that extracts URLs, strips these to the host portion,
resolves the IP, queries WHOIS, and parses this for response email
addresses.

This isn't possible on messages which are quoted printable (though this
appears to be possible by converting the string "=2E" to "."), or
otherwise encoded (the plaintext isn't available).

I've explored a number of options, including munapct, uudecode,
metamail, but none appears to do what I want reliably.  My current
workaround is to pipe a message segment from the "view-attachments" menu
within mutt.  I'd like to be able to run this from either the index
mode, or against an mbox or maildir folder.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?              Home of the brave
  http://gestalt-system.sourceforge.net/                    Land of the free
We freed Dmitry! Boycott Adobe! Repeal the DMCA! http://www.freesklyarov.org
Geek for Hire                      http://kmself.home.netcom.com/resume.html

Attachment: pgpdbXmPLNqjs.pgp
Description: PGP signature


Reply to: