Why is enterprise IT not standardized? (was: Enterprise and Debian Pure Blends)

To: debian-enterprise@lists.debian.org
Subject: Why is enterprise IT not standardized? (was: Enterprise and Debian Pure Blends)
From: Russ Allbery <rra@debian.org>
Date: Fri, 03 Sep 2010 15:10:42 -0700
Message-id: <[🔎] 87d3supjfh.fsf_-_@windlord.stanford.edu>
In-reply-to: <[🔎] 201009031604.02165.jesus.navarro@undominio.net> ("Jesús M. Navarro"'s message of "Fri, 3 Sep 2010 16:04:02 +0200")
References: <20100804170722.GA14145@an3as.eu> <[🔎] 201009020332.58250.jesus.navarro@undominio.net> <[🔎] 87fwxsc3oo.fsf@windlord.stanford.edu> <[🔎] 201009031604.02165.jesus.navarro@undominio.net>
"Jesús M. Navarro" <jesus.navarro@undominio.net> writes:

> I know (well, not exactly: I knew you had a strong saying about
> Stanford's central IT while not knowing what exactly your role was) and
> even then I dared say "uhmmm, not to sure".

> What does it make Stanford such a particular IT environment, different
> to a thousand other Universties around the world?

That's a very good question.

> All I can think of is just two things:
> 1) legacisms
> 2) Russ Albery being the technical lead instead of anyone else

I don't think that's the right answer.  :)

It's not entirely the wrong answer, of course.  Every environment has its
legacy applications, and they do have a significant influence on how
things are done.  And the IT deployment strategy is always going to be
driven heavily by the expertise of the local staff.  People solve problems
along lines that match their expertise.  For example, sites with an IMAP
expert tended to do lots of things via the IMAP protocol, where other
sites would only use IMAP somewhat grudgingly and only for mail.  Sites
with a distributed file system often have a much different IT
infrastructure than sites without one.  And so forth.

But I think these are only small special cases of what I think is the much
larger reasons why not only Stanford but almost every large enterprise is
different than every other large enterprise.  I think there are, speaking
very broadly, two reasons:

1. IT does not exist in isolation.  It is developed and deployed to meet
   the needs of the organization, and every organization has somewhat
   different needs (or they wouldn't be separate distinct organizations).
   One very important and very accurate aphorism in IT and in software
   development is that your software interfaces tend to reflect the
   political structure of the organizations that developed them.  This
   tends to override even more traditional technical requirements.

2. At every stage of planning an IT infrastructure, there are choices with
   differing tradeoffs.  As a result, by the time you build out the
   complete IT infrastructure required by a large enterprise, you've gone
   through a combinatorial explosion of paths.  Even assuming a very
   simplified model where each choice only has two options, after twenty
   choices there are now over one million different ways of doing IT.
   There are a lot more than twenty choices, and choices tend to have a
   lot more than two options.

I think it's very informative to look at a typical enterprise problem and
walk through how it works at Stanford to see both how those choices are
made and why they're made differently at different sites.  Let's take a
nice, simple problem: creating a new account.  (Anyone who's designed IT
infrastructure for an enterprise is now laughing at "simple.")

Now, immediately, you have a divergence between different types of
organizations.  Small businesses probably only have one type of account:
an employee.  So, one procedure.  Larger companies will often have a
couple types of accounts: employees and contractors.  Schools have
students and staff.  Stanford has a ton of different types of accounts:
students, faculty, staff, emeritus faculty, SLAC employees, Hospital
employees, casual affiliates, guest accounts, guest wireless access,
campus residents with no other affiliation... it goes on.  They all have
their own unique differences for account provisioning.

But let's simplify.  Let's look at account creation for only one specific
case: a newly admitted student.  Here's roughly what happens at Stanford:

 1. New student information entered into Peoplesoft by Admissions.

 2. Peoplesoft allocates a university ID, which is sent to the student.

 3. Student record harvested (via XML) from Peoplesoft by the Person
    Registry, which collects all person information from the disparate
    sources of person data at the university.

 4. Student goes to online web site (backed by a Java web application),
    enters university ID, chooses account name and password.

 5. Web application makes remctl (middleware) call to a middleware service
    to create a new account, disabled.

 6. Middleware service authenticates to KDC, creates new account via
    kadmin protocol and queues a password change for Active Directory.

 7. Java middleware checks student eligibility, determines services,
    creates additional database entries, and makes remctl call to the
    middleware service to activate the account.

 8. Middleware service authenticates to KDC, enables account via kadmin,
    and queues account enable for Active Directory.

 9. Java middleware posts internal person and account events to SonicMQ.

10. Person database to LDAP gateway harvests the person event and puts the
    new student information into OpenLDAP master by converting the
    database information to LDIF.  This uses a local suPerson schema,
    since none of the existing LDAP schemas for representing people have
    nearly enough metadata and granularity of data for the data we track.
    LDAP gateway then posts external person events to SonicMQ.

11. Account database to LDAP gateway harvests the account event and writes
    an entry into the account LDAP tree (separate from the person tree
    since a person may have multiple accounts and accounts may not be
    people).  This uses the local suAccount schema.  LDAP gateway then
    posts external account events to SonicMQ.

12. The (locally-written) Windows harvester notices an LDAP syncrepl event
    in the replication logs of the OpenLDAP person directory, retrieves
    the new person directory information from OpenLDAP, converts it to the
    Active Directory schema, and creates a new entry in Active Directory
    with a randomized password.

13. The Kerberos to Active Directory password synchronization software,
    which has been periodically retrying to synchronize the password for
    the new account and failing because it didn't exist in Active
    Directory yet, notices that the account exists now and sets the
    account password in Active Directory to match that in the main campus
    Kerberos realm.

14. The account provisioning software harvests the external account events
    posted by the LDAP gateway, one for each service that the newly
    created account is supposed to receive.  It retrieves additional
    information about each service from LDAP and writes local events into
    the account provisioning queue.

15. The account is assigned a numeric UID and given an entry in the
    account provisioning database, and a PTS entry for AFS is created
    using that UID.  That UID is written back into the account registry
    and a new internal account event is posted to SonicMQ.

16. The account LDAP gateway harvests the internal account event, pushes
    the UID information into the account directory, realizes that it now
    has all the information required for the NIS schema, and adds a NIS
    schema object to the account directory record for that account, which
    now enables LDAP clients to do nsswitch lookups for that account.

17. The account home directory is created in AFS (using locally-developed
    scripts to do things like find the least-loaded file server and to
    create some initial home directory content and ACLs).

18. The account mailbox is created in Zimbra via a remctl call that runs a
    wrapper around the Zimbra command-line tools, pulling the additional
    information needed about the account from the person LDAP tree.  The
    provisioning database then retrieves from Zimbra the specific mailbox
    server to which the account was assigned and stores that in the
    database.

19. An hourly cron job exports a new map of usernames to Zimbra mailbox
    servers into AFS that includes the entry for the new account.

20. The DNS servers for the *.pobox.stanford.edu zone pick up the newly
    generated map and turn it into a DNS zone file that maps
    <username>.pobox.stanford.edu to the Zimbra mailbox server for that
    account, allowing the user to then read mail using our recommended
    mail client configuration.

This is the "moderate complexity" version.  I've completely omitted some
systems, such as student ID card provisioning and the data flow for the ID
card number back into the Person Registry and from there into LDAP.

This account creation flow is, in its details, completely specific to
Stanford.  No other enterprise creates new accounts in exactly the same
way, and much of the software involved in this process is bespoke
software.  No other higher educational institution creates new student
accounts the same way.

Furthermore, if you run through this process with any other higher
educational institution, you'll find that they have their own completely
unique process, and no other institution creates accounts the same way
that *they* do as well.

So, why is that?  Well, look through each step and think about the
decisions that are involved.  At every point, some sites are making a
different decision, and at that point the flow diverges, often with
implications for every downstream system.  There are *tons* of different
choices.  Here are a few examples:

* Business decisions.  What services do students even get?  We give all
  students local mailboxes.  Many universities only give them Google Mail
  accounts.  Do they even get home directories?  Maybe they have other
  services that Stanford doesn't provide them.  When are incoming students
  eligible for accounts?  These are all university policy decisions that
  are usually not made by IT, at least alone.

* HR/student information/business management systems.  As anyone who works
  in this space will quickly tell you, the choice of a student information
  system informs every other choice you make.  We use Peoplesoft.  Other
  sites use, for example, SAP.  Furthermore, no two installations of
  Peoplesoft or SAP are quite the same because they're implementation
  toolkits, not turnkey products.

* Choices of service provisioning technology.  We use AFS for home
  directories and Zimbra for mail service.  Other places use CIFS and
  Exchange.  Or NFS and Cyrus IMAP.  We run both Active Directory and a
  Heimdal Kerberos KDC.  Some sites have only one or the other.  We have
  an OpenLDAP environment.  Some sites might use Novell eDirectory.  Or
  Sun ONE.  Or a metadirectory product from Oracle.  Those systems have
  different assumptions and different prerequisites, and they all have
  different costs and different benefits.

* Choices of middleware technology.  We use remctl for service to service
  links.  Other sites use web services, or RPC calls, or flat file
  transfers, or ssh.  I think there's a site that uses Rx.  We use SonicMQ
  for messaging, except where we use LDAP replication logs.  You could use
  ActiveMQ.  You could roll your own event system.  You could not use a
  message bus at all.

* Degree of automation.  Note that the only human interactions in this
  system are when someone from Admissions enters the student into the
  student information system and when the student selects their account
  name and password.  That's typical of a large university, but not the
  only way to do it.  Some places have staff members authorized to create
  accounts.  Accounts may be created at different points in the admissions
  process.

* Legacy factors.  We have an extensive internally-written people and
  account middleware system that we've been adapting to Stanford's needs
  for 20 years.  Most of our peer institutions also have an extensive
  internally-written people and account middleware system.  Sometimes it's
  written as add-ons to Peoplesoft or SAP, and sometimes it's like ours
  and written as a separate system.  Some sites still use the Moira system
  from MIT, although each Moira deployment is at this point somewhat
  unique.

* Organizational boundaries.  Peoplesoft is run by one group.  The Person
  Registry was historically run by a different group.  The LDAP servers
  are run by a third group, which also handles account service
  provisioning and the Kerberos KDCs.  Historically, the Active Directory
  environment was run by a fourth group.  Note how the integration points
  and the places where abstracted data models are used and carefully
  adhered to match organizational boundaries, where data had to move
  between organizations.  Other sites will have different organizational
  boundaries and will combine some things we keep separate while
  separating some things that we combine.

Now, look at all the code that I described in the above 20 steps.  Suppose
you're some other institution who wants to follow a best practice, or
suppose you're a software author who wants to build something for other
institutions based on what Stanford does.  How much of all the code that
I've implicitly described above do you think you could reuse?

The answer is not very much.  One of the things I try to do as much as I
can is release our tools and code as free software.  You can find remctl
in Debian, for example, and you can find the middleware layer we use to
talk to the KDC on my web site, as well as some of our AFS management
scripts.  There are other things that *could* be generalized, such as some
of the things we do to Zimbra.  And there are tons and tons of things that
are specific to Stanford's business logic, or that depend in detail on all
the other choices we made.

If you do a bunch of work to make our code more generic, how much of it
could you use then?  I'd say maybe 15-20% at most, before you run into
assumptions (SonicMQ, or Peoplesoft, or OpenLDAP, or completely different
requirements around what can be automated, or different business rules for
handling students, etc.) that just isn't true in some other environment.
This is why most estimates say that somewhere between 80-90% of the
software out there in the world is bespoke software written internally for
a single organization.

It's very important to never forget what _The Mythical Man-Month_ says
about software development: programming systems (roughly, code that's been
abstracted and supports reasonable configuration) are three times as hard
to write as in-house software that solves a specific local problem.
Programming products (something that could be sold to a paying customer)
are three times harder to write than *that*.

Free software falls somewhere betewen the two of them (no one writes
documentation the way Brooks was used to any longer, for example), but
someone has to do that additional work to take in-house software and turn
it into something usable by someone else.  I try to take time to do that,
but I have a *huge* backlog of things at Stanford that are generalizable
that I haven't finished getting to a releasable point.  And that's just
the stuff that *doesn't* embed Stanford business logic in deep ways.

> Not telling that taking advantage of "owning" an overperforming
> professional is not a good thing, but those two things are indicative of
> a craftsmanship, not a profession, much less an ingeneering.

I think it's important to always remember that IT is not like building a
bridge or a road or an airplane.  There isn't a concrete, final goal for
IT the way that there is when you're building a bridge.  There aren't a
constrained set of possible solutions like there are for a bridge.  You
never finish an enterprise IT design and then say you're done.

IT means anything that can be done with a computer.  Anything that can be
done with a computer is, increasingly, anything that humans do.  IT is as
complex as human activity.

That's why I don't think that IT, in general, will ever become
engineering: human endeavor as a whole will never be reducible to
engineering.  The problem space and complexity space will never stand
still, because IT will always be asked to support the next level of human
endeavor.  Specific *pieces* of IT will reach their natural conclusion,
become standardized and commoditized, and become engineering, and that's
already happened with, for example, network switches.  But the IT
department of an enterprise will always have to keep moving forward to
tackle problems that are somewhere between art and craft.

Finally, let me revisit your original two explanations and say more about
why I don't think they're the right answer to why Stanford is different
than everyone else:

> 1) legacisms

Having the Registry system does lead us down particular paths, but many
other sites have ended up with something similar without having that
system.  And notice that, in the above, there's very little here that
works the same as it did 10 years ago.  Even our Peoplesoft installation
isn't that old.

> 2) Russ Albery being the technical lead instead of anyone else

Of the complete system that I sketched out above, I was involved in
designing, even at a very partial level, only maybe 50% of it, and I wrote
less than 10% of that code.  And the most influential parts of the design
(the LDAP schema, the Registries design, the Peoplesoft integration) are
things I had very little to do with.  Furthermore, from talking to peer
institutions, we're not *that* special.  We do somewhat more interesting
things than schools with less funding, of course, but there are lots of
people out there like me who design systems like this.

In fact, I think somewhat the opposite is true: Stanford is probably
somewhat *more* like everyone else than it would be if I weren't here,
since I both do a lot of work on releasing our tools (so that other sites
can become more like us) and on staying in touch with the free software
world so that we can pick up good ideas invented elsewhere.  Sites without
that often seem to go down the custom professional services path, where
they hire an Oracle or an SAP to come in and customize the environment for
their needs on top of a single software stack and end up with something
completely custom to them that's locked in to a single vendor and which
they pay to have upgraded periodically.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>
Reply to:
References:
- Re: Enterprise and Debian Pure Blends
  - From: "Jesús M. Navarro" <jesus.navarro@undominio.net>
- Re: Enterprise and Debian Pure Blends
  - From: Russ Allbery <rra@debian.org>
- Re: Enterprise and Debian Pure Blends
  - From: "Jesús M. Navarro" <jesus.navarro@undominio.net>
Prev by Date: Re: Enterprise and Debian Pure Blends
Next by Date: Configuration management vs. automated integration (was: Enterprise and Debian Pure Blends)
Previous by thread: Re: Enterprise and Debian Pure Blends
Next by thread: Configuration management vs. automated integration (was: Enterprise and Debian Pure Blends)
Index(es):
- Date
- Thread