[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Good mail management techniques?



Any reason for the CC's on this mail?  I'm leaving them intact, though I
tend to post directly to list for list responses.

on Sun, Sep 09, 2001 at 04:18:53PM -0700, Ross Boylan (RossBoylan@stanfordalumni.org) wrote:
> On Sun, Sep 09, 2001 at 12:12:14PM -0700, Karsten M. Self wrote:
> .....
> > The concept you're proposing has some similarities to ideas espoused by
> > David Gelertner, whose capsule biography will always read "Yale
> > professor, computer scientist, and victim of the Unabomber (Theodore
> > Kaczynski)".  David survived the attempt on his life, though he was
> > permanently injured as a result.
> > 
> > Data on Lifestreams, Gelertner's project, is somewhat hard to find, the
> > following provides an overview:
> > 
> >     http://www.fend.es/members/magazine/march97/lifeb.html
> 
> On it's face, the idea of organizing by time is different from what I
> had in mind.  However, it may be that the other classification
> facilities would give me what I was looking for.
> 
> It doesn't seem things have gotten too far, though.  A few details
> below.

[Casbah]

> > 	http://ntlug.org/casbah/
> 
> Link doesn't work.

> >   - GNU Gather.  Formerly known as PINN.
> > 
> Can't find any trace of this on GNU's site.
> 
> 
> Also there was a reference to www.lifestreams.com (as I recall); that
> didn't work either.

My own impression of LN was that it somewhat fit this description as
well:  didn't work well.

The problem as I see it is that this isn't a problem space that maps
well to an organized solution.  I've heard and read snippets of
Gelertner's ideas from time to time, and suspect I only half get it.

The keys that *do* make sense are:

  - Hierarchical categorization is ultimately futile, *if* only a single
    hierarchy can be applied.  
    
  - Allowing multiple hierarchies my provide
    more utility, but ultimately even these efforts run into the dual
    problems of:

  - Categorizing data is itself an expense of the system.  You end up
    spending some amount of time/effort in applying arbitrary
    associations to content.

  - Freedom to create later, arbitrary, associations is highly valuable.
    Systems such as Everything2 and Wiki are useful in that they allow
    "usage patterns" to develop through data.  Interestingly, Wiki
    tradition leans strongly *away* from time-marking data, while this
    is Gelertner's primary ordering basis (I've had this argument on
    Meatball Wiki).

  - Indexing (and searchability) is more useful than ordering.  Don't
    structure your data, instead, structure your *views* of it.  The Web
    itself is an exemplar of this:  the Web isn't structured (though
    indices such as Yahoo! dMoz overlay structure on it), rather, great
    utility is provided by search engines.  The best of these (Google,
    Teoma, Vivisimo) utilize the structure and patterns of the Web
    itself to impart more meaning and value.  This is greatly helped by:

  - A rough document structure.  There are certain elements of a
    document (any document) which are relatively constant:  creation
    (and sometimes access/modification) date, author, title, abstract or
    summary, and content.  There may also be an intended recipient.  A
    minimal tagging structure (email and Usenet lend themselves strongly
    to this, HTML/XML/SGML somewhat less so) to enforce this structure
    helps tremendously.

  - Relationships between the documents themselves.  Parent/Child
    relationships in mail and Usenet, links and URLs in web documents.

  - The indexing system must take advantage of these features of the
    data.  

Ultimately, however, a large portion of the structuring of the system
comes from the users themselves.  As such, there's only so much any one
tool or set of tools can do.  A better system provides minimal ordering
in the gross sense of the data -- it has to be remembered that
"database" != "relational database".  Any collection of data can be a
database.  Text is poorly suited to most relational measures anyway,
what with fixed record lengths, poor support of BLOBS, and other issues.
File-based storage is actually a pretty decent fit, particularly with an
advanced, hash-based, journaling filesystem (e.g.:  Reiserfs)[1].

In the mail front, one of the better toolsets from a net flexibility
standpoint is the 'mh' mailhandling macros.  Data are simply files,
streamed to and from stdin and stdout.  Doesn't get much easier than
that.  The overview of the Lifestreams project that I found suggests
that it's essentially built around a similar architecture.

    http://www.fend.es/members/magazine/march97/lifeb.html

--------------------
Notes:

1.  I've got my own archive of some 125,000+ files, posts over a four
    year period to an online web discussion.   A full directory listing
    under Reiserfs takes six seconds from a dead start (no cache, output
    to /dev/null).  Once caching has been enabled, the directory can be
    scanned in about 1.6 seconds:

	[karsten@ego:archive]$ time /bin/ls -U | wc -l
	 124657

	real    0m1.590s
	user    0m0.540s
	sys     0m0.210s

    The corresponding operations under ext2fs were many seconds if not
    several minutes.

-- 
Karsten M. Self <kmself@ix.netcom.com>          http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?             There is no K5 cabal
  http://gestalt-system.sourceforge.net/               http://www.kuro5hin.org
   Free Dmitry! Boycott Adobe! Repeal the DMCA!    http://www.freesklyarov.org
Geek for Hire                        http://kmself.home.netcom.com/resume.html

Attachment: pgpkQLTwS9kU7.pgp
Description: PGP signature


Reply to: