on Sun, Dec 23, 2001 at 10:49:46AM -0900, Christopher S. Swingley (cswingle@iarc.uaf.edu) wrote: > I need to write a program the extracts the ASCII text portion > of email messages for insertion into a database. I looked at the > libmailtools-perl package, but it doesn't look like it can deal with > the annoying variety of mail that I may need to parse (The silly +'s > at the end of lines, MIME-attached HTML, vcards, etc.). > > What I want is a filter that I pass an email in, and out pops the > ASCII, 72-line width formatted message. All attachments, HTML mail, > vcards and strangeness is removed. I'm looking for something vagely similar. I think what I'm looking for is a tool that will strictly decode printed-quotable mail, base64-encoded mail, and other representations that don't resolve as plaintext. I _don't_ need to resolve HTML or other tagging formats. The objective is to get the mail body into a form that can be scanned for website references. I use this as part of my spam response system, with a script that extracts URLs, strips these to the host portion, resolves the IP, queries WHOIS, and parses this for response email addresses. This isn't possible on messages which are quoted printable (though this appears to be possible by converting the string "=2E" to "."), or otherwise encoded (the plaintext isn't available). I've explored a number of options, including munapct, uudecode, metamail, but none appears to do what I want reliably. My current workaround is to pipe a message segment from the "view-attachments" menu within mutt. I'd like to be able to run this from either the index mode, or against an mbox or maildir folder. Peace. -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What part of "Gestalt" don't you understand? Home of the brave http://gestalt-system.sourceforge.net/ Land of the free We freed Dmitry! Boycott Adobe! Repeal the DMCA! http://www.freesklyarov.org Geek for Hire http://kmself.home.netcom.com/resume.html
Attachment:
pgpdbXmPLNqjs.pgp
Description: PGP signature