David A. Wheeler's Blog

Sun, 13 Jul 2014

Flawfinder version 1.28 released!

I’ve released yet another new version of flawfinder - now it’s version 1.28. Flawfinder is a simple program that examines C/C++ source code and reports on likely security flaws in the program, ranked by risk level.

This new version has some new capabilities. Common Weakness Enumeration (CWE) references are now included in most hits (this makes it easier to use in conjunction with other tools, and it also makes it easier to find general information about a weakness). The new version of flawfinder also has a new option to only produce reports that match a regular expression (e.g., you can report only hits with specific CWE values). This version also adds support for the git diff format.

This new version also has a number of bug fixes. For example, it handles files not ending in newline, and it more gracefully handles handles unbalanced double-quotes in sprintf. A bug in reporting the time executed has also been fixed.

For more information, or a copy, just go to my original flawfinder home page or the flawfinder project page on SourceForge.net. Enjoy!

path: /security | Current Weblog | permanent link to this entry

Tue, 10 Jun 2014

Interview on Application Security

A new interview of me is available: David A. Wheeler on the Current State of Application Security (by the Trusted Software Alliance) (alternate link). In this interview I discuss a variety of topics with Mark Miller, including the need for education in developing secure software, the need to consider security thoughout the lifecycle, and the impact of componentization. I warn that many people do not include security (including software assurance) when they ask for quality; while I agree in principle that security is generally part of quality, in practice you have to specifically ask for security or you won’t get it.

This interview is part of their 50 in 50 interviews series, along with Joe Jarzombek (Department of Homeland Security), Steve Lipner (Microsoft), Bruce Schneier, Jeff Williams (Aspect Security and OWASP), and many others. It was an honor and pleasure to participate, and I hope you enjoy the results.

path: /security | Current Weblog | permanent link to this entry

Wed, 21 May 2014

On Dave and Gunnar show

There is now an interview of me on the Dave and Gunnar show (episode #51). I talk mostly about How to prevent the next Heartbleed. I also talk about my FLOSS numbers database (as previously discussed) and vulnerability economics. There was even a mention of my Fully Countering Trusting Trust through Diverse Double-Compiling work.

Since the time of the interview, more information has surfaced about Heartbleed. Traditional fuzzing could not find Heartbleed, but it looks like some fuzzing variants could even if the OpenSSL code was unchanged; see the latest version for more information. If you learn more information relevant to the paper, let me know!

path: /oss | Current Weblog | permanent link to this entry

Thu, 08 May 2014

FLOSS numbers database!

If you are doing research related to Free / Libre / Open Source Software (FLOSS), then I have something that may be useful to you: the FLOSS numbers database.

My paper Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers! is a big collection of quantitative studies about FLOSS. Too big, in fact. There have been a lot of quantitative studies about FLOSS over the years! A lot of people want to query this information for specific purposes, and it is hard to pull out just the parts you want from a flat document. I had thought that as FLOSS became more and more common, fewer people would want this information… but I still get requests for it.

So I am announcing the FLOSS numbers database; it provides the basic information in spreadsheet format, making it easy to query for just the parts you want. My special thanks go to Paul Rotilie, who worked to get the data converted from my document format into the spreadsheet.

If you want to discuss this database, I have set up a discussion group: Numbers about Free Libre Open Source Software. If you are doing research and need or use this kind of information, please feel free to join. If you just need a presenatation based on this, you might like my Presentation: Why Free-Libre / Open Source Software (FLOSS or OSS/FS)? Look at the Numbers!.

This database is the sort of thing that if you need it, you really need it. I am sure it is incomplete… but I am also sure that with your help, we can make it better.

path: /oss | Current Weblog | permanent link to this entry

Sat, 03 May 2014

How to Prevent the next Heartbleed

My new article How to Prevent the next Heartbleed describes why the Heartbleed vulnerability in OpenSSL was so hard to find… and what could be done to prevent something like it next time.

path: /security | Current Weblog | permanent link to this entry

Thu, 24 Apr 2014

Opensource.com interview

Opensource.com has posted an interview of me, titled “US government accelerating development and release of open source”. In this interview I describe the current state of the use of open source software by the US federal government, the challenges of the Federal acquisition system, and I also discuss what may happen next. Enjoy!

path: /oss | Current Weblog | permanent link to this entry

Thu, 20 Feb 2014

Presenting at American Society for Quality

On February 25, 2014, I will be presenting on “Open Source Software and Government” at the American Society for Quality (ASQ) Software SIG. You can join in person in McLean, Virginia; there will also be various video tele-conferencing sites, and you can join by phone or online as well.

If you’re interested, you’re welcome to join us, but you’ll need to pre-register.

path: /oss | Current Weblog | permanent link to this entry

Fri, 07 Feb 2014

William W. McCune: He made the world a better place through source code

Here I want to honor the memory of William W. (“Bill”) McCune, who helped change the world for the better by releasing software source code. I hope that many other researchers and government policy-makers will follow his lead… and below I intend to show why.

But first, I should explain my connection to him. My PhD dissertation involved countering the so-called “trusting trust” attack. In this attack, an attacker subverts the tools that developers use to create software. This turns out to be a really nasty attack. If a software developer’s tools are subverted, then the attacker actually controls the computer system running the software. This is no idle concern, either; we know that computers are under constant attack, and that some of these attacks are very sophisticated. Such subversions could allow attackers to essentially control all computers worldwide, including the global financial system, militaries, electrical systems, dams, you name it. That kind of power makes this kind of attack potentially worthwhile, but only if it cannot be detected and countered. For many years there were no good detection mechanisms or countermeasures. Then Henry Spencer suggested a potential solution… but there was no agreement that his idea would really counter attackers. That matters; how can you be absolutely certain about some claim?

The “gold standard” for knowing if something is true is a formal mathematical proof. Many important questions cannot be proved this way, all proofs depend on assumptions, and creating a formal proof is often hard. Still, a formal mathematical proof is the best guarantee we have for being certain about something. And there were a lot of questions about whether or not Henry Spencer’s approach would really counter this attack. So, I went about trying to prove that Henry Spencer’s idea really would counter the attack (if certain assumptions held).

After trying several other approaches, I found that the tools developed by Bill McCune (in particular prover9, mace4, and ivy) were perfect for my needs. These tools made my difficult work far easier, because his tools managed to mostly-automatically prove claims mathematically once they were described using mathematical statements. In the end, I managed to mathematically prove that Henry Spencer’s approach really did counter the subverted compiler problem. The tools Bill McCune developed and released made a real difference in helping to solve this challenging real-world problem. I didn’t need much help (because his tools were remarkably easy to use and well-documented), but he responded quickly when I emailed him too.

Sadly, Bill McCune suddenly died on May 4, 2011, leaving the field of automated reasoning deprived of one of its founders (particularly in the subfields of practical theorem proving and model building). In 2013 an academic book was released in his honor (“Automated Reasoning and Mathematics: Essays in Memory of William W. McCune”, Lecture Notes in Artificial Intelligence 7788). That book’s preface has a nice tribute to Bill McCune, listing some of his personal accomplishments (e.g., the development of Otter) and other accomplishments that his tools enabled.

Bill McCune released many tools as open source software (including prover9, mace4, ivy, and the older tool Otter). This means that anyone could use the software (for any purpose), modify it, and distribute it (with or without modification). These freedoms had far-reaching effects, accelerating research in automated proving of claims, as well as speeding the use of these techniques. That book’s preface notes several of Bill McCune’s accomplishments, including the impact he had by releasing the code:

All too often the U.S. government spends a fortune in research, and then that same research has to be recreated from scratch several times again by other researchers (sometimes unsuccessfully). This is a tremendous waste of government money, and can delay work by years (if it can happen at all) resulting in far less progress for the money spent. Bill McCune instead ensured that this results got out to people who could use and improve upon them. In this specific area Bill McCune made software research available to many others, so that those others could use it, verify it, and build on top of those results.

Of course, he was not alone in recognizing the value of sharing research when implemented as software. The paper ”The Evolution from LIMMAT to NANOSAT” by Armin Biere (April 2004) makes the same point when they tried to reproduce others’ work. That paper states, “From the publications alone, without access to the source code, various details were still unclear… what we did not realize, and which hardly could be deduced from the literature, was [an optimization] employed in GRASP and CHAFF [was critically important]… Only [when CHAFF’s source code became available did] our unfortunate design decision became clear… The lesson learned is, that important details are often omitted in publications and can only be extracted from source code. It can be argued, that making source code … available is as important to the advancement of the field as publication.”

More generally, Free the Code.org argues that if government pays to develop software, then it should be available to others for reuse and sharing. That makes sense to me; if “we the people” paid to develop software, then by default “we the people” should receive it. I think it especially makes sense in science and research; without the details of how software works, results are not reproduceable. Currently much of science is not reproduceable (and thus not really science), though open science efforts are working to change this.

I think Bill McCune made great contributions to many, many, others. I am certainly one of the beneficiaries. Thank you, Bill McCune, so very much for your life’s work.

path: /oss | Current Weblog | permanent link to this entry

Sun, 01 Dec 2013

Shellcheck

I just learned about shellcheck, a tool that reports on common mistakes in (Bourne) shell scripts. If you write shell scripts, you should definitely check out this static analyzer. You can try it out by pasting shell scripts into their website. It is open source software, so you can also download and use it to your heart’s content.

It even covers some of the issues identified in Filenames and Pathnames in Shell: How to do it Correctly. If you are interested in static analyzers for software, you can also see my Flawfinder home page which identifies many other static analysis tools.

path: /oss | Current Weblog | permanent link to this entry

Sat, 16 Nov 2013

Vulnerability bidding wars and vulnerability economics

I worry that the economics of software vulnerability reporting is seriously increasing the risks to society. The problem is the rising bidding wars for vulnerability information, leading to a rapidly-growing number of vulnerabilities known only to attackers. These kinds of vulnerabilities, when exploited, are sometimes called “zero-days” because users and suppliers had zero days of warning. I suspect we should create laws limiting the sale of vulnerability information, similar to the limits we place on organ donation, to change the economics of vulnerability reporting. To see why, let me go over some background first.

A big part of the insecure software problem today is that relatively few of today’s software developers know how to develop software that resists attack (e.g., via the Internet). Many schools don’t teach it at all. I think that’s ridiculous; you’d think people would have heard about the Internet by now. I do have some hope that this will get better. I teach a graduate course on how to develop secure software at George Mason University (GMU), and attendance has increased over time. But today, most software developers do not know how to create secure software.

In contrast, there is an increasing bidding war for vulnerability information by organizations who intend to exploit those vulnerabilities. This incentivizes people to search for vulnerabilities, but not report them to the suppliers (who could fix them) and not alert the public. As Bruce Schneier reports in “The Vulnerabilities Market and the Future of Security” (June 1, 2012), “This new market perturbs the economics of finding security vulnerabilities. And it does so to the detriment of us all.” Forbes ran an article about this in 2012, Meet The Hackers Who Sell Spies The Tools To Crack Your PC (And Get Paid Six-Figure Fees). The Forbes article describes what happened when French security firm Vupen broke the security of the Chrome web browser. Vupen would not tell Google how they broke in, because the $60,000 award Google from Google was not enough. Chaouki Bekrar, Vupen’s chief executive, said that they “wouldn’t share this [information] with Google for even $1 million… We want to keep this for our customers.” These customers do not plan to fix security bugs; they purchase exploits or techniques with the “explicit intention of invading or disrupting”. Vupen even “hawks each trick to multiple government agencies, a business model that often plays its customers against one another as they try to keep up in an espionage arms race.” Just one part of the Flame espionage software (exploiting Microsoft Update) has been estimated as being worth $1 million when it was not known.

This imbalance in economic incentives creates a dangerous and growing mercenary subculture. You now have a growing number of people looking for vulnerabilities, keeping them secret, and selling them to the highest bidder… which will encourage more to look for, and keep secret, these vulnerabilities. After all, they are incentivized to do it. In contrast, the original developer typically does not know how to develop secure software, and there are fewer economic incentives to develop secure software anyway. This is a volatile combination.

Some think the solution is for suppliers to pay people when they report security vulnerabilities to suppliers (“bug bounties”). I do not think bug bounty systems (by themselves) will be enough, though suppliers are trying.

There has been a lot of discussion about Yahoo and bug bounties. On September 30, 2013, the article What’s your email security worth? 12 dollars and 50 cents according to Yahoo reported that Yahoo paid for each vulnerability only $12.50 USD. Even worse, this was not actual money, it was “a discount code that can only be used in the Yahoo Company Store, which sell Yahoo’s corporate t-shirts, cups, pens and other accessories”. Ilia Kolochenko, High-Tech Bridge CEO, says: “Paying several dollars per vulnerability is a bad joke and won’t motivate people to report security vulnerabilities to them, especially when such vulnerabilities can be easily sold on the black market for a much higher price. Nevertheless, money is not the only motivation of security researchers. This is why companies like Google efficiently play the ego card in parallel with [much higher] financial rewards and maintain a ‘Hall of Fame’ where all security researchers who have ever reported security vulnerabilities are publicly listed. If Yahoo cannot afford to spend money on its corporate security, it should at least try to attract security researchers by other means. Otherwise, none of Yahoo’s customers can ever feel safe.” Brian Martin, President of Open Security Foundation, said: “Vendor bug bounties are not a new thing. Recently, more vendors have begun to adopt and appreciate the value it brings their organization, and more importantly their customers. Even Microsoft, who was the most notorious hold-out on bug bounty programs realized the value and jumped ahead of the rest, offering up to $100,000 for exploits that bypass their security mechanisms. Other companies should follow their example and realize that a simple “hall of fame”, credit to buy the vendor’s products, or a pittance in cash is not conducive to researcher cooperation. Some of these companies pay their janitors more money to clean their offices, than they do security researchers finding vulnerabilities that may put thousands of their customers at risk.” Yahoo has since decided to establish a bug bounty system with larger rewards.

More recently, the Internet Bug Bounty Panel (founded by Microsoft and Facebook) will award public research into vulnerabilities with the potential for severe security implications to the public. It has a minimum bounty of $5,000. However, it certainly does not cover everything; they only intend to pay out widespread vulnerabilities (wide range of products or end users), and plan to limit bounties to only severe vulnerabilities that are novel (new or unusual in an interesting way). I think this could help, but it is no panacea.

Bug bounty systems are typically drastically outbid by attackers, and I see no reason to believe this will change.

Indeed, I do not think we should mandate, or even expect, that suppliers will pay people when people report security vulnerabilities to suppliers (aka bug bounties). Such a mandate or expectation could kill small businesses and open source software development, and it would almost certainly chill software development in general. Such payments would not also deal with what I see as a key problem: the people who sell vulnerabilities to the highest bidder. Mandating payment by suppliers would get most people to send them problem reports… if the bug bounty payments were required to be larger than payments to those who would exploit the vulnerability. That would be absurd, because given current prices, such a requirement would almost certainly prevent a lot of software development.

I think people who find a vulnerability in software should normally be free to tell the software’s supplier, so that the supplier can rapidly repair the software (and thus fix it before it is exploited). Some people call this “responsible disclosure”, though some suppliers misuse this term. Some suppliers say they want “responsible disclosure”, but they instead appear to irresponsibly abuse the term to stifle warning those at risk (including customers and the public), as well as irresponsibly delay the repair of critical vulnerabilities (if they repair the vulnerabilities at all). After all, if a supplier convinces the researcher to not alert users, potential users, and the public about serious security defects in their product, then these irresponsible suppliers may believe they don’t need to fix it quickly. People who are suspicious about “responsible disclosure” have, unfortunately, excellent reasons to be suspicious. Many suppliers have shown themselves untrustworthy, and even trustworthy suppliers need to have a reason to stay that way. For that and other reasons, I also think people should be free to alert the public in detail, at no charge, about a software vulnerability (so-called “full disclosure”). Although it’s not ideal for users, full disclosure is sometimes necessary; it can be especially justifiable when a supplier has demonstrated (through past or current actions) that he will not rapidly fix the problem that he created. In fact, I think it’d be an inappropriate constraint of free speech to prevent people from revealing serious problems in software products to the public.

But if we don’t want to mandate bug bounties, or so-called “responsible disclosure”, then where does that leave us? We need to find some way to change the rules so that economics works more closely with and not against computer security.

Well, here is an idea… at least one to start with. Perhaps we should criminalize selling vulnerability information to anyone other than the supplier or the reporter’s government. Basically, treat vulnerability information like organ donation: intentionally eliminate economic incentives in a specific area for a greater social good.

That would mean that suppliers can set up bug bounty programs, and researchers can publish information about vulnerabilities to the public, but this would sharply limit who else can legally buy the vulnerability information. In particular, it would be illegal to sell the information to organized crime, terrorist groups, and so on. Yes, governments can do bad things with the information; this particular proposal does nothing directly to address it. But I think it’s impossible to prevent a citizen from telling his country’s government about a software vulnerability; a citizen could easily see it as his duty. I also think no government would forbid buying such information for itself. However, by limiting sales to that particular citizen’s government, it becomes harder to create bidding wars between governments and other groups for vulnerability information. Without the bidding wars, there’s less incentive for others to find the information and sell it to them. Without the incentives, there would be fewer people working to find vulnerabilities that they would intentionally hide from suppliers and the public.

I believe this would not impinge on freedom of speech. You can tell no one, everyone, or anyone you want about the vulnerability. What you cannot do is receive financial benefit from selling vulnerability information to anyone other than the supplier (who can then fix it) or your own government (and that at least reduces bidding wars).

Of course, you always have to worry about unexpected consequences or easy workarounds for any new proposed law. An organization could set itself up specifically to find vulnerabilities and then exploit them itself… but that’s already illegal, so I don’t see a problem there. A trickier problem is that a malicious organization (say, the mob) could create a “supplier” (e.g., a reseller of proprietary software, or a downstream open source software package) that vulnerability researchers could sell their information to, working around the law. This could probably be handled by requiring, in law, that suppliers report (in a timely manner) any vulnerability information they receive to their relevant suppliers.

Obviously there are some people will do illegal things, but some people will avoid doing illegal things in principle, and others will avoid illegal activities because they fear getting caught. You don’t need to stop all possible cases, just enough to change the economics.

I fear that the current “vulnerability bidding wars” - left unchecked - will create an overwhelming tsunami of zero-days available to a wide variety of malicious actors. The current situation might impede the peer review of open source software (OSS), since currently people can make more money selling an exploit than in helping the OSS project fix the problem. Thankfully, OSS projects are still widely viewed as public goods, so there are still many people who are willing to take the pay cut and help OSS projects find and fix vulnerabilities. I think proprietary and custom software are actually in much more danger than OSS; in those cases it’s a lot easier for people to think “well, they wrote this code for their financial gain, so I may as well sell my vulnerability information for my financial gain”. The problem for society is that this attitude completely ignores the users and those impacted by the software, who can get hurt by the later exploitation of the vulnerability.

Maybe there’s a better way. If so, great… please propose it! My concern is that economics currently makes it hard - not easy - to have computer security. We need to figure out ways to get Adam Smith’s invisible hand to work for us, not against us.

Standard disclaimer: As always, these are my personal opinions, not those of employer, government, or (deceased) guinea pig.

path: /security | Current Weblog | permanent link to this entry

Mon, 14 Oct 2013

Readable Lisp version 1.0.0 released!

Lisp-based languages have been around a long time. They have some interesting properties, especially when you want to write programs that analyze or manipulate programs. The problem with Lisp is that the traditional Lisp notation - s-expressions - is notoriously hard to read.

I think I have a solution to the problem. I looked at past (failed) solutions and found that they generally failed to be general or homoiconic. I then worked to find notations with these key properties. My solution is a set of notation tiers that make Lisp-based languages much more pleasant to work with. I’ve been working with many others to turn this idea of readable notations into a reality. If you’re interested, you can watch a short video or read our proposed solution.

The big news is that we have reached version 1.0.0 in the readable project. We now have an open source software (MIT license) implementation for both (guile) Scheme and Common Lisp, as well as a variety of support tools. The Scheme portion implements the SRFI-105 and SRFI-110 specs, which we wrote. One of the tools, unsweeten, makes it possible to process files in other Lisps as well.

So what do these tools do? Fundamentally, they implement the 3 notation tiers we’ve created: curly-infix-expressions, neoteric-expressions, and sweet-expressions. Sweet-expressions have the full set of capabilities.

Here’s an example of (awkward) traditional s-expression format:

(define (factorial n)
  (if (<= n 1)
    1
    (* n (factorial (- n 1)))))

Here’s the same thing, expressed using sweet-expressions:

define factorial(n)
  if {n <= 1}
    1
    {n * factorial{n - 1}}

I even briefly mentioned sweet-expressions in my PhD dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling” (see section A.3).

So if you are interested in how to make Lisp-based languages easier to read, watch our short video about the readable notations or download the current version of the readable project. We hope you enjoy them.

path: /misc | Current Weblog | permanent link to this entry

Thu, 26 Sep 2013

Welcome, those interested in Diverse Double-Compiling (DDC)!

A number of people have recently been discussing or referring to my PhD work, “Fully Countering Trusting Trust through Diverse Double-Compiling (DDC)”, which counters Trojan Horse attacks on compilers. Last week’s discussion on reddit based on a short short slide show discussed it directly, for example. There have also been related discussions such as Tor’s work on creating deterministic builds.

For everyone who’s interested in DDC… welcome! I intentionally posted my dissertation, and a video about it, directly on the Internet with no paywall. That way, anyone who wants the information can immediately get it. Enjoy!

I even include enough background material so other people can independently repeat my experiments and verify my claims. I believe that if you cannot reproduce the results, it is not science… and a lot of computational research has stopped being a science. This is not a new observation; “Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science” by Victoria C. Stodden (Computing in Science & Engineering, 2010) summarizes a roundtable on this very problem. The roadtable found that “Progress in computational science is often hampered by researchers’ inability to independently reproduce or verify published results” and, along with a number of specific steps, “reproducibility must be embraced at the cultural level within the computational science community.” “Does computation threaten the scientific method (by Leslie Hatton and Adrian Giordani) and “The case for open computer programs” in Nature (by Darrel C. Ince, Leslie Hatton, and John Graham-Cumming) make similar points. For one of many examples, the paper “The Evolution from LIMMAT to NANOSAT” by Armin Biere (Technical Report #444, 15 April 2004) reported that they could not reproduce results because “From the publications alone, without access to the source code, various details were still unclear.” In the end they realized that “making source code… available is as important to the advancement of the field as publications”. I think we should not pay researchers, or their institutions, if they fail to provide the materials necessary to reproduce the work.

I do have a request, though. There is no patent on DDC, nor is there a legal requirement to report using it. Still, if you apply my approach, please let me know; I’d like to hear about it. Alternatively, if you are seriously trying to use DDC but are having some problems, let me know.

Again - enjoy!

path: /security | Current Weblog | permanent link to this entry

Wed, 21 Aug 2013

Open security

Modern society depends on computer systems. Yet computer security problems let attackers subvert the very systems that society depends on. This is a serious problem.

I think one approach that could help is “open security” - applying open source software (OSS) approaches to help solve computer security problems. To see why, let’s look at some background.

Back in the 1970s people collaboratively developed software that today we would call open source software or free-libre software. At the time many assumed these approaches could not scale up to big systems… but they were wrong. Software systems that would cost over a billion U.S. dollars to redevelop have been developed as open source software, and Wikipedia has used similar approaches to collaboratively develop the world’s largest encyclopedia.

So… if we can collaboratively develop multi-billion software systems, and large encyclopedias, can we use the same kinds of collaborative approaches to improve computer security? I believe we can… but if we are going to do this, we need to define a term for this (so that we can agree on what we are doing!).

I propose that open security is the application of open source software (OSS) approaches to help solve cyber security problems. OSS approaches collaboratively develop and maintain intellectual works (including software and documentation) by enabling users to use them for any purpose, as well as study, create, change, and redistribute them (in whole or in part). Cyber security problems are a lack of security (confidentiality, integrity, and/or availability), or potential lack of security (a vulnerability), in computer systems and/or the networks they are a part of. In short, open security improves security through collaboration.

You can see more details in my paper What is open security? [PDF] [DOC]. I intentionally built on previous work such as the Free Software Definition by the Free Software Foundation (FSF), the Open Source Definition (Annotated) by the Open Source Initiative (OSI), the Creative Commons license work, and the Definition of Free Cultural Works by Freedom Defined (the last one is, for example, the basis of the Wikimedia/Wikipedia licensing policy).

The Open security site has been recently set up so that you and others can join and get involved. So please - get involved! We are only just starting, and the direction we go depends on the feedback we get.

Further reading:

path: /oss | Current Weblog | permanent link to this entry

Tue, 06 Aug 2013

Don’t anthropomorphize computers, they hate that

A lot of people who program computers or live in the computing world ‐ including me ‐ talk about computer hardware and software as if they are people. Why is that? This is not as obvious as you’d think.

After all, if you read the literature about learning how to program, you’d think that programmers would never use anthropomorphic language. “Separating Programming Sheep from Non-Programming Goats” by Jeff Atwood discusses teaching programming and points to the intriguing paper “The camel has two humps” by Saeed Dehnadi and Richard Bornat. This paper reported experimental evidence on why some people can learn to program, while others struggle. Basically, to learn to program you must fully understand that computers mindlessly follow rules, and that computers just don’t act like humans. As their paper said, “Programs… are utterly meaningless. To write a computer program you have to come to terms with this, to accept that whatever you might want the program to mean, the machine will blindly follow its meaningless rules and come to some meaningless conclusion… the consistent group [of people] showed a pre-acceptance of this fact: they are capable of seeing mathematical calculation problems in terms of rules, and can follow those rules wheresoever they may lead. The inconsistent group, on the other hand, looks for meaning where it is not. The blank group knows that it is looking at meaninglessness, and refuses to deal with it. [The experimental results suggest] that it is extremely difficult to teach programming to the inconsistent and blank groups.” Later work by Saeed Dehnadi and sometimes others expands on this earlier work. The intermediate paper “Mental models, Consistency and Programming Aptitude” (2008) seemed to have refuted the idea that consistency (and ignoring meaning) was critical to programming, but the later “Meta-analysis of the effect of consistency on success in early learning of programming” (2009) added additional refinements and then re-confirmed this hypothesis. The reconfirmation involved a meta-analysis of six replications of an improved version of Dehnadi’s original experiment, and again showed that understanding that computers were mindlessly consistent was key in successfully learning to program.

So the good programmers know darn well that computers mindlessly follow rules. But many use anthropomorphic language anyway. Huh? Why is that?

Some do object to anthropomorphism, of course. Edjar Dijkstra certainly railed against anthropomorphizing computers. For example, in EWD854 (1983) he said, “I think anthropomorphism is the worst of all [analogies]. I have now seen programs ‘trying to do things’, ‘wanting to do things’, ‘believing things to be true’, ‘knowing things’ etc. Don’t be so naive as to believe that this use of language is harmless.” He believed that analogies (like these) led to a host of misunderstandings, and that those misunderstandings led to repeated multi-million-dollar failures. It is certainly true that misunderstandings can lead to catastrophe. But I think one reason Dijkstra railed particularly against anthropomorphism was (in part) because it is a widespread practice, even among those who do understand things ‐ and I see no evidence that anthropomorphism is going away.

The Jargon file specifically discusses anthropomorphization: “one rich source of jargon constructions is the hackish tendency to anthropomorphize hardware and software. English purists and academic computer scientists frequently look down on others for anthropomorphizing hardware and software, considering this sort of behavior to be characteristic of naive misunderstanding. But most hackers anthropomorphize freely, frequently describing program behavior in terms of wants and desires. Thus it is common to hear hardware or software talked about as though it has homunculi talking to each other inside it, with intentions and desires… As hackers are among the people who know best how these phenomena work, it seems odd that they would use language that seems to ascribe consciousness to them. The mind-set behind this tendency thus demands examination. The key to understanding this kind of usage is that it isn’t done in a naive way; hackers don’t personalize their stuff in the sense of feeling empathy with it, nor do they mystically believe that the things they work on every day are ‘alive’.”

Okay, so others have noticed this too. The Jargon file even proposes some possible reasons for anthropomorphizing computer hardware and software:

  1. It reflects a “mechanistic view of human behavior.” “In this view, people are biological machines - consciousness is an interesting and valuable epiphenomenon, but mind is implemented in machinery which is not fundamentally different in information-processing capacity from computers… Because hackers accept that a human machine can have intentions, it is therefore easy for them to ascribe consciousness and intention to other complex patterned systems such as computers.” But while the materialistic view of humans has respectible company, this “explanation” fails to explain why humans would use anthropomorphic terms about computer hardware and software, since they are manifestly not human. Indeed, as the Jargon file acknowledges, even hackers who have contrary religious views will use anthropological terminology.
  2. It reflects “a blurring of the boundary between the programmer and his artifacts - the human qualities belong to the programmer and the code merely expresses these qualities as his/her proxy. On this view, a hacker saying a piece of code ‘got confused’ is really saying that he (or she) was confused about exactly what he wanted the computer to do, the code naturally incorporated this confusion, and the code expressed the programmer’s confusion when executed by crashing or otherwise misbehaving. Note that by displacing from “I got confused” to “It got confused”, the programmer is not avoiding responsibility, but rather getting some analytical distance in order to be able to consider the bug dispassionately.”
  3. “It has also been suggested that anthropomorphizing complex systems is actually an expression of humility, a way of acknowleging that simple rules we do understand (or that we invented) can lead to emergent behavioral complexities that we don’t completely understand.”

The Jargon file claims that “All three explanations accurately model hacker psychology, and should be considered complementary rather than competing.” I think the first “explanation” is completely unjustified. The second and third explanations do have some merit. However, I think there’s a simpler and more important reason: Language.

When we communicate with a human, we must use some language that will be more-or-less understood by the other human. Over the years people have developed a variety of human languages that do this pretty well (again, more-or-less). Human languages were not particularly designed to deal with computers, but languages have been honed over long periods of time to discuss human behaviors and their mental states (thoughts, beliefs, goals, and so on). The sentence “Sally says that Linda likes Tom, but Tom won’t talk to Linda” would be understood by any normal seven-year-old girl (well, assuming she speaks English).

I think a primary reason people anthropomorphic terminology is because it’s much easier to communicate that way when discussing computer hardware and software using existing languages. Compare “the program got confused” with the overly long “the program executed a different path than the one expected by the program’s programmer”. Human languages have been honed to discuss human behaviors and mental states, so it is much easier to use languages this way. As long as both the sender and receiver of the message understand the message, the fact that the terminology is anthropomorphic is not a problem.

It’s true that anthropomorphic language can confuse some people. But the primary reason it confuses some people is that they still have trouble understanding that computers are mindless ‐ that computers simply do whatever their instructions tell them. Perhaps this is an innate weakness in some people, but I think that addressing this weakness head-on can help counter it. This is probably a good reason for ensuring that people learn a little programming as kids ‐ not because they will necessarily do it later, but because computers are so central to the modern world that people should have a basic understanding of them.

path: /misc | Current Weblog | permanent link to this entry

Thu, 20 Jun 2013

Industry-wide Misunderstandings of HTTPS (SSL/TLS)

Industry-wide Misunderstandings of HTTPS describes a nasty security problem involving HTTP (SSL/TLS) and caching. The basic problem is that developers of web applications do not know or understand web standards. The result: 70% of sites tested expose private data on users’ machines by recording data that is supposed to be destroyed.

Here’s the abstract: “Most web browsers, historically, were cautious about caching content delivered over an HTTPS connection to disk - to a greater degree than required by the HTTP standard. In recent years, in response to the increased use of HTTPS for non-sensitive data, and the proliferation of bandwidth-hungry AJAX and Web 2.0 sites, some browsers have been changed to strictly follow the standard, and cache HTTPS content far more aggressively than before. HTTPS web servers must explicitly include a response header to block standards-compliant browsers from caching the response to disk - and not all web developers have caught up to the new browser behavior. ISE identified 21 (70% of sites tested) financial, healthcare, insurance and utility account sites that failed to forbid browsers from storing cached content on disk, and as a result, after visiting these sites, unencrypted sensitive content is left behind on end-users’ machines.”

This vulnerability isn’t as easy to exploit as some other problems; it just means that data that should have been destroyed is hanging around. But it does set up serious problems, because that information should have been destroyed.

This is really just yet another example of the security problems that can happen when people assume, “the only web browser is Internet Explorer 6”. That was never true, and by ignoring standards, they set themselves up for disaster. This isn’t even a new standard; HTTP version 1.1 was released in 1999, so there’s been plenty of time to fix things. Today, many modern systems use AJAX, and SSL/TLS encryption is far more widely used as well, and given these changing conditions, web browsers are changing in standards-compliant ways. Web application developers who followed the standard are doing just fine. The web application developers who ignored the standards are, once again, putting their users at risk.

path: /security | Current Weblog | permanent link to this entry

Tue, 30 Apr 2013

OSS License Clinic

If you’re interested in understanding the legal, contract, or government acquisition issues in applying free / libre / open source software (FLOSS), come to the “Open Source License Clinic” on May 9, 2013, 9am-noon (EDT), in Washington, DC. This clinic will be hosted by the non-profit Open Source Initiative (OSI), and is “designed as a cross-industry, cross-community workshop for legal, contract, acquisition and program professionals who wish to deepen their understanding of open source software licenses, and raise their proficiency to better serve their organizations objectives as well as identify problems which may be unique to government. Discussion of licenses and issues in straight-forward terms make the clinic of value to anyone involved in the lifecycle of a technology decision/acquisition or strategy for internal software development.”

I’m one of the speakers, along with:

The location for the license clinic will be:

101 Independence Ave SE
Madison Building, 6th Floor, Dining Room A
Washington, DC 20540

You might also be interested in the Open Source Community Summit on May 10 (the following day) in Washington, DC.

path: /oss | Current Weblog | permanent link to this entry

Thu, 21 Mar 2013

French government OSS policy

Free/libre/open source software (FLOSS) continues to grow around the world, and governments around the world are trying to establish policies about it. Yet in the U.S. we often don’t hear about them. I just posted about a UK policy; here’s a recent French policy, translated into English.

The French administration, in September 2012, established a set of guidelines and recommendations on the proper use of Free Software (aka open source software) in the French government. This is called the “Ayrault Memorandum” (circulaire Ayrault, in French) and was signed in September 2012 by the French Prime Minister. The document was mainly produced by the DISIC (the Department of Interministerial Systems Information and Communication) and the CIOs of some departments. The DISIC is in charge of coordinating the administration actions on information systems.

path: /oss | Current Weblog | permanent link to this entry

Mon, 18 Mar 2013

UK Government prefers OSS

The UK government is mandating a “preference” for open source software in its Government Service Design Manual Open Source section, to be effective April 2013. The draft manual says, “Use open source software in preference to proprietary or closed source alternatives, in particular for operating systems, networking software, web servers, databases and programming languages.”

path: /oss | Current Weblog | permanent link to this entry

Sun, 10 Mar 2013

Readable Lisp: Sweet-expressions

I’ve used Lisp-based programming languages for decades, but while they have some nice properties, their traditional s-expression notation is not very readable. Even the original creator of Lisp did not particularly like its notation! However, this problem turns out to be surprisingly hard to solve.

After reviewing the many past failed efforts, I think I have figured out why they failed. Past solutions typically did not work because they failed to be general (the notation is independent from any underlying semantic) or homoiconic (the underlying data structure is clear from the syntax). Once I realized that, I devised (with a lot of help from others!) a new notation, called sweet-expressions (t-expressions), that is general and homoiconic. I think this creates a real solution for an old problem.

You can download and try out sweet-expressions as released by the Readable Lisp S-expressions Project by downloading our new version 0.7.0 release.

If you’re interested, please participate! In particular, please participate in the SRFI-110 sweet-expressions (t-expressions) mailing list. SRFIs let people write specifications for extensions to the Scheme programming language (a Lisp), and this SRFI lets people in the Scheme community discuss it.

The following table shows what an example of traditional (ugly) Lisp s-expressions, the same thing in sweet-expressions, and a short explanation.

s-expressions Sweet-expressions (t-expressions) Explanation
(define (fibfast n)
  (if (< n 2)
    n
    (fibup n 2 1 0)))
define fibfast(n)
  if {n < 2}
    n
    fibup n 2 1 0
Typical function notation
Indentation, infix {...}
Single expr = no new list
Simple function calls

path: /misc | Current Weblog | permanent link to this entry

Tue, 22 Jan 2013

Speaking at ACM DC Chapter

FYI, on 2013-03-04 I plan to speak about “Open Source Software, Government, and Cyber Security” at the Association for Computing Machinery (ACM), Washington, DC Chapter. It will be at 1203 19th St, 3rd Floor, Washington, DC. See the link for more information.

path: /oss | Current Weblog | permanent link to this entry