Americas

  • United States

You really should know what the Andrew File System is

News Analysis
May 10, 20177 mins
Cloud ComputingEnterprise ApplicationsIT Training

"Model of storing data in cloud and delivering parts of it via on-demand caching at the edge is something everyone takes for granted today," one AFS creator says

When I saw that the creators of the Andrew File System (AFS) had been named recipients of the $35K ACM Software System Award, I said to myself “That’s cool, I remember AFS from the days of companies like Sun Microsystems… just please don’t ask me to explain what the heck it is.”

Don’t ask my colleagues either. A quick walking-around-the-office survey of a half dozen of them turned up mostly blank stares at the mention of the Andrew File System, a technology developed in the early 1980s and named after Andrew Carnegie and Andrew Mellon. But as the Association for Computing Machinery’s award would indicate, AFS is indeed worth knowing about as a foundational technology that paved the way for widely used cloud computing techniques and applications.

MORE:

Mahadev “Satya” Satyanarayanan, a Carnegie Mellon University Computer Science professor who was part of the AFS team, answered a handful of my questions via email about the origins of this scalable and secure distributed file system, the significance of it, and where it stands today. Satyanarayanan was recognized by ACM along with John Howard, Michael Leon Kazar, Robert Nasmyth Sidebotham, David Nichols, Sherri Nichols, Alfred Spector and Michael West, who worked as a team via the Information Technology Center partnership between Carnegie Mellon and IBM (the latter of which incidentally funded this ACM prize).

Is there any way to quantify how widespread AFS use became and which sorts of organizations used it most? Any sense of how much it continues to be used, and for what?

Over a roughly 25-year timeframe, AFS has been used by many U.S. and non-U.S. universities.   Many national labs, supercomputing centers and similar institutions have also used AFS.   Companies in the financial industry (e.g., Goldman Sachs) and other industries have also used AFS. A useful snapshot of AFS deployment was provided by the paper “An Empirical Study of a Wide-Area Distributed File System” that appeared in ACM Transactions on Computer Systems in 1996. That paper states:

“Originally intended as a solution to the computing needs of the Carnegie Mellon University, AFS has expanded to unite about 1000 servers and 20,000 clients in 10 countries. We estimate that more than 100,000 users use this system worldwide.  In geographic span as well as in number of users and machines, AFS is the largest distributed file system that has ever been built and put to serious use.”

Figure 1 in that paper shows that AFS spanned 59 educational cells, 22 commercial cells, 11 governmental cells, and 39 cells outside the United States at the time of the snapshot.   In addition to this large federated multi-organization deployment of AFS,  there were many non-federated deployments of AFS within individual organizations.

What has been AFS’s biggest impact on today’s cloud and enterprise computing environments?

The model of storing data in the cloud and delivering parts of it via on-demand caching at the edge is something everyone takes for granted today.  That model was first conceived and demonstrated by AFS, and is perhaps its biggest impact.   It simplifies management complexity for operational staff, while preserving performance and scalability for end users.   From the viewpoint of end users, the ability to walk up to any machine and use it as your own provides enormous flexibility and convenience.  All the data that is specific to a user is delivered on demand over the network.  Keeping in sync all the machines that you use becomes trivial.    Users at organizations that deployed AFS found this an addictive capability.  Indeed, it was this ability that inspired the founders of DropBox to start their company.  They had used AFS at MIT as part of the Athena environment, and wanted to enable at wider scale this effortless ability to keep in sync all the machines used by a person.   Finally, many of the architectural principles and implementation techniques of AFS have influenced many other systems over the past decades.

How did AFS come to be created in the first place?

In 1982, CMU and IBM signed a collaborative agreement to create a “distributed personal computing environment” on the CMU campus, that could later be commercialized by IBM.   The actual collaboration began in January 1983.   A good reference for information about these early days is the 1986 CACM paper by [James H.] Morris et al entitled “Andrew: A Distributed Personal Computing Environment”.  The context of the agreement was as follows.  In 1982, IBM had just introduced the IBM PC, which was proving to be very successful.  At the same time, IBM was fully aware that enterprise-scale use of personal computing required the technical ability to share information easily, securely, and with appropriate access controls.  This was possible in the timesharing systems that were still dominant in the early 1980s.  How to achieve this in the dispersed and fragmented world of a PC-based enterprise was not clear in 1982.  A big part of the IBM-CMU collaborative agreement was to develop a solution to this problem.  More than half of the first year of the Information Technology Center (1983) was spent in brainstorming on how best to achieve this goal.  Through this brainstorming process, a distributed file system emerged by about August 1983 as the best mechanism for enterprise-scale information sharing.  How to implement such a distributed file system then became the focus of our efforts.

What would the AFS creators have done differently in building AFS if they had to do it over again?

I can think of at least two things: one small and one big.

The small thing is that the design and early evolution of AFS happened prior to the emergence of [network address translation (NAT)]-based firewalls in networking.  These are in widespread use today in homes, small enterprises, etc.  Their presence makes it difficult for a server to initiate contact with a client in order to establish a callback channel.  If we had developed AFS after the widespread use of NAT-based firewalls, we would have carefully rethought how best to implement callbacks in the presence of  NAT firewalls.

The bigger thing has to do with the World Wide Web.  The Mosaic browser emerged in the early 1990s, and Netscape Navigator a bit later.  By then AFS had been in existence for many years, and was in widespread use at many places. Had we realized how valuable the browser would eventually become as a tool, we would have paid much more attention to it.  For example, a browser can be used in AFS by using “file://” rather than “http://” in addresses.  All of the powerful caching and consistence-maintenance machinery that is built into AFS would then have been accessible through a user-friendly tool that has eventually proved to be enormously valuable.  It is possible that the browser and AFS could have had a much more symbiotic evolution, as HTTP and browsers eventually did.

Looks like maybe there are remnants of AFS alive in the open source world?

Indeed.  OpenAFS continues to be an active open source project.  Many institutions (including CMU) continue to use AFS for production use, and this code is now based on OpenAFS.

Also, my work on the Coda File System forked off from the November 1986 version of AFS.  Coda was open-sourced in the mid-1990s.   That code base continues to be alive and functional today.  Buried in Coda are ideas and actual code from early AFS.

Do any of you have any spectacular plans for what they’ll do with the prize money? 

Nothing concrete yet.  We have discussed possibly donating the funds to a charitable cause.