How the Apache Project Boosted the Free and Open Source Software Movements

The Apache project has undeniably changed the world. Some of its impacts include the following:

  • Their HTTPD application was the first robust, production-ready web server.
  • They have also made an impressive legal impact, as the Apache 2.0 license is the most popular in the open source world.
  • As the Apache Software Foundation expanded into other computing projects, they became home to much of today's most sought-after software: projects such as Hadoop, Spark, and Kafka—a cornucopia of 60 open source packages with official or incubator status handling big data and machine learning.

Along with these achievements, the Apache project's contributions to culture and online communities are equally significant. This article looks back at the early years of the project, which is now 25 years old. (The Apache Software Foundation, formed when the project was ready to be put on a stable footing, officially celebrates its 22nd anniversary this year on March 26.)

Most descriptions of Apache's early years focus on their technical work, so I hope this article contributes toward a better understanding of how Apache helped build the movement for open source. Reports about that period that I draw on are included at the end of the article.

A Permanent Redirect

Apache started as a group of programmers who came together over email in early 1995 to rescue a web server developed by the National Center for Supercomputing Applications (NCSA), whose support for the software was flagging. At that point in history, collaboration over the internet was still relatively new. The usual model was for a single project leader (the term "benevolent dictator" seems to have been invented in 2005) or a small team (as in the various BSD projects) to maintain control, and to represent a hub to which others could submit changes. These changes were called patches (which gave rise to the name Apache), a term that seems to underestimate the significance of contributions by outsiders.

Linux and other projects were forming communities that went beyond patch submission and allowed general participation in decision-making over email. But these communities lacked ground rules for participation and could display rough interpersonal dynamics that repelled many contributors. Debian was probably one of the earliest free software projects to distribute leadership over geographical distances, but its communication style was also confrontational (some observers would characterize it in even stronger language) to a point that most modern open source projects don't tolerate. In short, the free and open source software movements had to do some growing up.

Engineering a Community

The Apache project was much more conscious of culture and group dynamics from the start. There was no opportunity to appoint a benevolent dictator, because the person who could have played that role had already left. This is an interesting side discussion in itself.

By 1992, the World Wide Web was recognized by internet users as a major force. But the software that supported it was in a fragile state. Tim Berners-Lee had developed a web server for the European Organization for Nuclear Research (CERN, after the French name), but it was designed for the needs of the organization, which involved sharing large data sets. The server was not appropriate for the kinds of publishing that the general public was starting to do in the early 1990s.

The NCSA was also a scientific organization, but its server had a broader appeal. In the development model typical of the time, its development rested on a single programmer, Rob McCool, and other programmers sent him their patches.

This changed after Marc Andreessen, the famous developer of web browsers, left the NCSA to form Netscape. (You may quite likely be using a descendant of his work, the Firefox web browser, to read this article.) A lot of NCSA staff followed him to Netscape, including McCool. Because Netscape was concerned only with a browser, suddenly no one was working on any web server at all.

From today's vantage point, it may seem strange that a critical part of internet infrastructure could be orphaned. But even though visionaries recognized great potential in the internet, it was still viewed as a research experiment by managers at most organizations. NCSA itself was concerned with supporting supercomputer researchers, and if it took interest in the web at all—Roy Fielding told me in an interview—they focused on the browser, just as Netscape did. NCSA didn't even announce that the web server had been effectively abandoned.

So in early 1995, according to Fielding, programmers began to notice that no one was responding when they emailed their patches to NCSA, and no updates were coming out. When they figured out what happened, they found each other over email and decided to pick up the project themselves.

The key characteristic of this oddly formed community was voluntarism. Nobody could be told what to work on. If they were interested, they wrote code. If not, they turned to other matters. One could jokingly call the team an anarcho-syndicalist commune (which fulfills the requirement of articles on computing to make at least one Monty Python reference).

This fundamental trait of the project led to the focus on respect, open communications, and other elements of what was later called The Apache Way.

Dirk-Willem van Gulik, in an interview for this article, explained that they had to "bias the rules toward action rather than inaction."  In other words, they didn't dare impose high barriers to getting proposals accepted, because it would discourage people from working at all. He said the early programmers did not discuss community much, but they did look for inspiration to the Internet Engineering Task Force (IETF), famous for its dependence on "rough consensus and running code." Later in the article, we'll see how the Apache project installed some formality into their proposal system.

What about group cohesion? Here, van Gulik says that the developers found they shared enough values and goals to work well together. From today's standpoint, the team looks homogeneous (all male, all educated, and mostly white.) But they came from many countries with different cultures, and some were not fluent in English. Furthermore, according to van Gulik, the developers came from a variety of other projects with very different cultures: notably OpenBSD, Linux, Perl, and Python. So it was an achievement to form a cohesive group over long geographic distances. Most of them did not meet in person until the first ApacheCon in 1998.

A key impact of open discussion is that it educates junior members of the group, helping them step up and take on greater responsibilities while staying true to the project's goals and culture. As Thomas Østerlie says, "Through collaboration and discussion tacit knowledge is transformed to explicit knowledge."

Risky Timeouts and Turning Points

All-volunteer organizations, particularly when they lack a central, cohesive team, suffer from a major risk: that participants will simply stop contributing. Just two months after the project's formation, according to Østerlie's thesis, this actually happened. The thesis contains an account of this period based on email archives, and an analysis.

A bit of basic computer philosophy will help us understand what happened. The computer field has moved from monolithic programs, where every little change can pull a hidden lever somewhere and have destructive effects on faraway parts of the application, to modular programs, which separate different functions. Modularity is seen most prominently nowadays in microservices.

Like most software in 1995, the NCSA server was monolithic. And that was sustainable, because McCool had been responsible for the whole thing.

But now think of what happens when a dozen people in different parts of the world start making changes. A "community of peers" (as The Apache Way puts it) cannot coexist with a monolithic architecture. Furthermore, Fielding told me, modularization would reduce arguments over what features to include. People who want a certain feature would create a module for it, and those who didn't want the feature could simply skip installation of the module.

But modularizing a large program is daunting. It can't be done through "patches." A single individual (effectively, a temporary benevolent dictator) has to re-architect the whole thing.

In the absence of modularization, people got frustrated and stopped contributing. Østerlie's claim was seconded by van Gulik, although he said the situation was not as dire as it seems from reading the thesis. Luckily, a programmer named Robert Thau took on the big job of creating a modular server, which he called the Shambhala project (after a Buddhist concept). Everyone (except one founding member who left the project) loved the change, and the Apache project picked up again.

This story illustrates the great historical value of working over mailing lists that a project archives. All conversations at this early period are available to researchers like Østerlie.

Another big turning point that put some strain on the community was the 2.0 release, which also revolved around a form of modularization. As van Gulik explained to me, some of the underlying libraries were proving useful to projects besides the web server. Many people wanted a version of the libraries that they could install without installing the server. But because separating out the libraries would be a large project without clear benefit to the people working on the server, some opposed it. Nevertheless, the release was accomplished.

The Real Meaning of +1

I'll end this article by describing how the Apache team found a way to instill the "bias toward action" mentioned earlier by van Gulik. It involved some subtle and non-intuitive decisions.

We already saw that the team rejected the benevolent dictator model. But formal democracy doesn't work either on a project where people come and go. There's no demos or constituency, just a bunch of people who turn up when they want.

Luckily, software makes it easy to accommodate most needs. As I explained earlier, people can choose the features they want, so anyone can add something they consider important. On the other hand, there had to be a coherent strategy to prevent bloat or poorly thought through changes.

Fortunately, a policy that the team came to for purely practical reasons proved valuable in preventing what van Gulik called "runaway processes." The team knew that people were in many different time zones, and that many would take off a couple days for a weekend or a holiday at different times. So they instituted a rule that they would wait at least 48 hours before deciding on any important change. Such a policy would be unacceptably slow-moving in many of today's DevOps, agile programming environments. But the Apache team appreciated the stability it brought.

How, then, to create a situation where a feature could be added if it had a certain minimum of support? This is where the famous +1 system came into being. It was almost an accidental innovation.

Rob Hartill, who played a big role on the team, found in March 1995 that he was falling behind in reviewing patches. To simplify his review, he spontaneously invented a three-level system (-1, 0, and +1) for rating patches and introduced it in a mail message.

People on mailing lists nowadays recognize the +1 convention. Its dominance is shown by its formalization on both GitHub and GitLab. Anyone working on a project can click a thumbs-up or thumbs-down button on an issue.

But the convention got much more sophisticated at Apache. According to van Gulik, on some occasions at least the numbers embody specific promises:

+1 — Not only do I support this proposal, but I will put effort into maintaining and fixing it.
-1 — I oppose this proposal, but if it is accepted, I will still put effort into maintaining and fixing it.
-0 — I oppose this proposal and will not maintain it.
+0 — I support this proposal but will not maintain it.

Persistent Connections

Apache remains a crucial web server, the most popular in the field. For building open source communities, the lessons learned by creating the project still resonate throughout the open source world. Every project is advised to respect the Apache value of "community over code."

In the late 2000 decade, community leaders came together to formalize best practices in free software development. The year 2009 saw the publication of Jono Bacon's Art of Community, based on his years of experience in open source. At the 2009 Open Source Convention, run by his publisher O'Reilly Media, Bacon introduced a Community Leadership Summit, which went on to run for a decade in many places around the world. (I edited the book, and participated in most of the Community Leadership Summits held at the Open Source Convention.)

To my mind, the open source community grew tremendously in its concern for community and culture during that decade. As I remember it, in the first couple Community Leadership Summits were attended by members of projects who had spontaneously taken on informal roles building and guiding their communities. Within a few years, Community Manager became a job description in many companies, especially in the open source space.

We have all learned a lot since 1995. Just as the world will never allow a critical part of internet infrastructure to go neglected (or so I hope—the 2014 Heartbleed vulnerability still weighs on my mind), the open source community will increasingly prioritize how it treats people. And that will lead to software that better meets people's needs.

Resources

FOSSlife Newsetter

Comments