David A. Wheeler's Blog

Sun, 08 Jul 2012

How to have a successful open source software (OSS) project: Internet Success

The world of the future belongs to the collaborators. But how, exactly, can you have a successful project with collaborators? Can we quantitatively analyze past projects to figure out what works, instead of just using our best guesses? The answer, thankfully, is yes.

I just finished reading the amazing Internet Success: A Study of Open-Source Software Commons. This landmark book by Charles M. Schweik and Robert C. English of the Massachusetts Institute of Technology (MIT) presents the results of five years of painstaking quantitative research to answer this question: “What factors lead some open source software (OSS) commons (aka projects) to success, and others to abandonment?”

If you’re doing serious research in how collaborative development projects succeed (or not), you have to get this book. If you’re running a project, you should apply its results, and frankly, you’d probably get quite a bit of insight about collaboration from reading it. The book focuses specifically on the development of OSS, but as the authors note, many of its lessons probably apply elsewhere. Here’s a quick review and summary.

Schweik and English examined over 100,000 projects on SourceForge, using data from SourceForge and developer surveys. Their approach to data collection and analysis is spelled out in detail in the book; the key is that they took the time to deeply dive into it. Many previous studies have focused on just a few projects, and they summarize those; while those are useful, they don’t tell the whole story. Schweik and English instead cover a broad array of projects, using quantitative analysis instead of guesswork.

Fair warning: The book is quite technical. People who are not used to statistical analysis will find some parts quite mysterious, and they answer a lot of questions you might not even have thought to ask. Because this is serious scientific research, they carefully define terms, walk through a variety of data, and present an avalanche of data. The key, though, is that they managed to find useful answers from the data, and their results are actually quite understandable.

They spend a whole chapter (chapter 7) defining the terms “success” and “abandonment”. The definitions of these terms are key to the whole study, so it makes sense that they spend time to define them. Interestingly, they switched to the term “abandonment” instead of the more common term “failure”; they found that “many projects that had ceased collaborating would not be seen as failed projects”, e.g., because that project code had been absorbed into another project or the developer had improved their development skills (where this was their purpose).

They use a very simple project lifecycle model — projects begin in initiation, and once the project has made its first software release, it switches to growth. They also categorized projects as success, abandonment, or indeterminate. Combining these produces 6 categories of project: success initiation (SI); abandonment initiation (AI); success growth (SG); abandonment growth (AG); indeterminant initiation (II); and indeterminant growth (IG). Their operational definition of success initiation (SI) is oversimplified but easy to understand: an SI project has at least one release. Their operational definition for a success growth (SG) project is very generous: at least 3 releases, at least 6 months between releases, and has more than 10 downloads. Chapter 7 gives details on these; I note these here because it’s hard to follow most of the book without knowing these categories. I could argue that these are really too generous a definition of success, but even with those definitions, they had many projects which did not meet these definitions, and it is important to understand why (so that future projects would be more likely to succeed).

They had so much data that even supercomputers could not directly process it. Given today’s computing capabilities, that’s pretty amazing.

So, what did they learn? Quite a bit. A few specific points are described in chapter 12. For example, they had presumed that OSS projects with limited competition would be more successful, but the effect is actually mildly the other way; “successful growth (SG) projects are more frequently found in environments where there is more competition, not less”. Unsurprisingly, projects with financial backing are “much more likely to be successful than those that are not” once they are in growth stage; although financing had an effect, its effects were not as strong in initiation.

As with any research material, if you don’t have time for the details, it’s a good idea to jump to the conclusions, which in this book is chapter 13. So what does it say?

One of the key results is that during initiation (before first release), the following are the most important issues, in order of importance, for success in an OSS project:

“Put in the hours. Work hard toward creating your first release.” The details in chapter 11 tell the story: If the leader put in more than 1.5 hours per week (on average), the project was successful 73% of the time; if the leader did not, the project was abandoned 65% of the time. They are not saying that leaders should only put in 2 hours a week; instead, the point is that the leader must consistently put in time for the project to get to its first release.
“Practice leadership by administering your project well, and thinking through and articulating your vision as well as goals for the project. Demonstrate your leadership through hard work…”
“Establish a high-quality Web site to showcase and promote your project.”
“Create good documentation for your (potential) user and developer community.”
“Advertise and market your project, and communicate your plans and goals with the hope of getting help from others.”
“Realize that successful projects are found in both GPL-based and non-GPL-compatible situations.”
“Consider, at the project’s outset, creating software that has the potential to be useful to a substantial number of users.” Remarkably, the minimum number of users is surprisingly small; they estimate that successful growth stage projects typically have at least 200 users. In general, the more potential users, the better.

None of these are earth-shattering surprises, but now they are confirmed by data instead of being merely guessed at. In particular, some items that people have claimed are important, such as keeping complexity low, were not really supported as important. In fact, successful projects tended to have a little more complexity. That is probably not because a project should strive for complexity. Instead, I suspect both successful and abandoned projects often strive to reduce complexity — so it not really something that distinguishes them — and I suspect sometimes a project that focuses on user needs has to have more complexity than one that does not, simply because user needs can sometimes require some complexity.

Similarly, they had guidance for growth projects, in order of importance:

“Your goal should be to create a virtuous circle where others help to improve the software, thereby attracting more users and other developers, which in turn leads to more improvements in the software…” Do this the same way it is done in initiation: spending time, maintain goals and plans, communicate the plans, and maintain a high-quality project web site.” The user community should actively interacting with the development team.
“Advertize and market your project.” In particular, successful growth projects are frequently projects that have added at least one new developer in the growth stage.
Have some small tasks available for contributors with limited time.
Welcome competition. The authors were surprised, but noted that “competition seems to favor success”. Personally, I do not find this surprising at all. Competition often encourages others to do better; we have an entire economic system based on that premise.
Consider accepting offers of financing or paid developers (they can greatly increase success rates). This one, in particular, should surprise no one — if you want to increase success, pay someone to do it.
“Keep institutions (rules and project governance) as lean and informal as possible, but do not be afraid to move toward more formalization if it appears necessary.”

The also have some hints of how potential OSS users (consumers) can choose OSS that is more likely to endure. Successful OSS projects have characteristics like more than 1000 downloads, users participating in bug tracker and email lists, goals/plans listed, a development team that responds quickly to questions, a good web site, good user documentation, and good developer documentation. A larger development team is a good sign, too.

These are just some of the research highlights. For details, well, get the book!

If you’re looking for more detailed guidance on how to run an OSS project, then a good place to go is “Producing Open Source Software: How to Run a Successful Free Software Project” by Karl Fogel. If you want to do it with or in the U.S. government, you might look at Open Technology Development (OTD): Lessons Learned & Best Practices for Military Software - OSD Report, May 2011 (full disclosure: I am co-author). Both of them were written before these research results were reported, but I think they are all quite consistent with each other.

I want to give some extra kudos to the authors: They have made a vast amount of their data avaiable so that analysis can be re-done, and so that additional analysis can be done. (They held back some survey data due to personally-identifying information issues, which is reasonable enough). Science depends on repeatability, yet much of today’s so-called “science” does not publish its data or analysis software, and thus cannot be repeated… and thus is not science.

The book is not perfect. It’s big and rather technical in some spots, which will make it hard reading for some. An unfortunate blot is that, while they’re usually extremely precise, there are serious ambiguities in their discussion on licensing. In particular, they have fundamentally inconsistent definitions for the term “GPL-compatible” and “GPL-incompatible” throughout the book, making their license analysis results suspect. On page 22, they define the term “GPL-incompatible” in an extremely bizarre and non-standard way; they define “GPL-incompatible” as software in which “firms can derive new works from OSS, but are not obliged to license new derivatives under the GPL [and] are not obligated to expose the code logic in [derivative products].” In short, they seem to using the term “GPL-compatible” as a synonym for what the rest of the world would call a “reciprocal” or “protective” license. Similarly, they seem to be defining the term “GPL-incompatible” to mean a “permissive” license. I don’t like non-standard terminology, but as long as unusual terms are defined clearly, I can deal with bizarre terminology. Yet later, on page 157, they define “GPL-compatible” completely differently, and give it its conventional meaning instead. That is, they define “GPL-compatible” as software that can be combined with the GPL (which includes not just the reciprocal GPL license, but which also includes many permissive licenses like the MIT license). My initial guess is that the page 22 text is just wrong, but it’s hard to be sure. There is another wrinkle, too, presuming that they meant the term “GPL-compatible” in the usual sense (and that page 22 is just wrong). One of the more popular licenses, the Apache License 2.0, has recently become GPL-compatible (on release of the GPL version 3), even though it wasn’t before. It’s not clear from the book that this is reflected in their data (at least I didn’t see it), if they actually used the term “GPL-compatible” in its usual sense, and there is enough Apache-licensed software that this would matter. This may just be a poor explanation of terms, but until this is cleared up, I would be cautious about its comments on licensing. Hopefully they will clear this up, and in addition, it would probably be very useful to re-run the licensing analysis to examine (1) GPL-compatible vs. GPL-incompatible, and (2) to examine the typical 3 license categories (permissive, weakly protective/reciprocal, and strongly protective/reciprocal).

So if you are interested in the latest research on how OSS projects become successful (or not), pick up Internet Success: A Study of Open-Source Software Commons. This book is a milestone in the serious study of collaborative development approaches.

What’s especially intriguing is that success is very achievable. While initiating your project you should keep at it and communicate (articulate the vision and goals, have a high-quality web site to showcase/promote the project, create good docuemntation, and advertize). Once it’s growing, work to attract more users and developers.

path: /oss | Current Weblog | permanent link to this entry