On Wed, Nov 26, 2025 at 12:29:14PM -0500, rhkramer@gmail.com wrote:
> Does anybody here know of an AWK or sed program to convert mbox files to HTML?
>
> My google (well, ddg) fu has not been very helpful -- I've turned up
> proprietary programs to do that, most of which run on Windows :-(
Don't. Mbox is not even well specified, so you'll need quite a bunch of
heuristics to get it (mostly) right. Officially, the separator between
messages is a "From " at the beginning of a line -- that's why you see
it sometimes escaped with a ">" in front of it whenever it occurs in the
middle of a message. But not every agent does that, so most programs try
to be smarter about it.
Partially quoting the Wikipedia [1]:
"All messages in an mbox mailbox are concatenated and stored as
plain text in a single file. Each message starts with the four
characters "From" followed by a space (the so-called "From_ line")
and the sender's email address. RFC 4155 defines that a UTC
timestamp follows after another separating space character."
And:
"Unlike the Internet protocols used for the exchange of email, the
format used for the storage of email has never been formally
defined through the RFC standardization mechanism and has been
entirely left to the developer of an email client. However, the
POSIX standard defined a loose framework in conjunction with the
mailx program [...]"
The whole Wikipedia article is worth reading, as often.
Have a look into the programs in "maildrop" Debian package. Excerpt
from the package description:
maildrop also comes with the following additional programs:
.
* reformail, an e-mail reformatting tool, which can detect duplicate
messages, manipulate message headers, split mailboxes into
individual messages, and generate autoreply messages
* maildirmake, which creates maildirs, and maildir folders
* deliverquota, which delivers mail to maildirs while taking account
software-imposed quotas
* reformime, a utility for reformatting MIME messages
* makemime, which creates MIME-formatted messages of arbitrary complexity
* lockmail, which creates dot-locks, file locks, and C-Client folder locks
* mailbot, a MIME-aware autoresponder utility
And -- as Greg wrote elsewhere in this thread, you'll have lots of
fun taking apart MIME (some of the above programs might help with
that) and deciding how to express all that stuff as HTML.
Cheers
[1] https://en.wikipedia.org/wiki/Mbox
--
tomás
Attachment:
signature.asc
Description: PGP signature