[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: awk or sed program to convert mbox files to HTML



On Wed, Nov 26, 2025 at 12:29:14PM -0500, rhkramer@gmail.com wrote:
> Does anybody here know of an AWK or sed program to convert mbox files to HTML?
> 
> My google (well, ddg) fu has not been very helpful -- I've turned up 
> proprietary programs to do that, most of which run on Windows :-(

Don't. Mbox is not even well specified, so you'll need quite a bunch of
heuristics to get it (mostly) right. Officially, the separator between
messages is a "From " at the beginning of a line -- that's why you see
it sometimes escaped with a ">" in front of it whenever it occurs in the
middle of a message. But not every agent does that, so most programs try
to be smarter about it.

Partially quoting the Wikipedia [1]:

  "All messages in an mbox mailbox are concatenated and stored as
   plain text in a single file. Each message starts with the four
   characters "From" followed by a space (the so-called "From_ line")
   and the sender's email address. RFC 4155 defines that a UTC
   timestamp follows after another separating space character."

And:

  "Unlike the Internet protocols used for the exchange of email, the
   format used for the storage of email has never been formally
   defined through the RFC standardization mechanism and has been
   entirely left to the developer of an email client. However, the
   POSIX standard defined a loose framework in conjunction with the
   mailx program [...]"

The whole Wikipedia article is worth reading, as often.

Have a look into the programs in "maildrop" Debian package. Excerpt
from the package description:

  maildrop also comes with the following additional programs:
 .
   * reformail, an e-mail reformatting tool, which can detect duplicate
                messages, manipulate message headers, split mailboxes into
                individual messages, and generate autoreply messages
   * maildirmake, which creates maildirs, and maildir folders
   * deliverquota, which delivers mail to maildirs while taking account
                    software-imposed quotas
   * reformime, a utility for reformatting MIME messages
   * makemime, which creates MIME-formatted messages of arbitrary complexity
   * lockmail, which creates dot-locks, file locks, and C-Client folder locks
   * mailbot, a MIME-aware autoresponder utility

And -- as Greg wrote elsewhere in this thread, you'll have lots of
fun taking apart MIME (some of the above programs might help with
that) and deciding how to express all that stuff as HTML.

Cheers

[1] https://en.wikipedia.org/wiki/Mbox
-- 
tomás

Attachment: signature.asc
Description: PGP signature


Reply to: