How to Prove Stuff Automatically

David A. Wheeler


The real world is a complicated place; in many cases we don’t even know the probability of something happening. Nevertheless, there are a variety of circumstances where it is important to prove that, given certain reasonable assumptions, certain conclusions necessarily follow.

The desire to conclusively prove claims is an ancient desire. Euclid showed (in geometry) that it was possible to start with a short list of reasonable assumptions, called axioms, and then prove an astonishing array of results. In 1685 Gottfried Leibniz stated in “The Art of Discovery” that “The only way to rectify our reasonings is to make them as tangible as those of the Mathematicians, so... when there are disputes... we can simply say: Let us calculate... to see who is right.” At the time, the logic languages necessary to apply to many fields didn’t exist, and even if they did, the challenge of applying them manually would have made them less useful.

The good news is that today there are a variety of computer programs and logic systems (languages) that can help you conclusively prove claims about the real world. The bad news is that it’s often hard to find out how to use these tools. For example, many of their creators assume that users have a vast amount of mathematical background.

When I was writing my dissertation Fully Countering Trusting Trust through Diverse Double-Compiling I ended up having to learn a lot of poorly-documented “folklore”. In this paper, I briefly provide some “lessons learned” that I hope will help others who want to prove that certain claims are true.

Brief Lessons Learned

Here are some brief lessons learned:

  1. Determine your goal (claim) and the basic kinds of assumptions that you believe will prove it. You’ll need to do this in great detail to complete a real proof, but it’s important to at first get a basic sense of what you’re trying to prove. Creating a few figures and naming all the “parts” of your problem can help. Proofs always require that you assume something; trying to prove something from nothing is hopeless. So, the key is to clarify your goals, and look for assumptions that you can easily defend to others.
  2. Learn several different logic languages and their supporting tools, then choose the “weakest” language that can easily express your assumptions and goals and the tool that best meets your needs. There are many different mathematical/logic languages, each with their strengths and weaknesses. In general, the more “expressive” a language is, the harder it is to automate... and if it’s hard to automate, the human ends up with more work to do to prove things. I’ll talk more about this below.
  3. Tools matter. Some tools are nicer to work with than others; try out several before you select one.
  4. If you're trying to prove something about (a model of) the real world, then start with an extremely simple model of the world, prove a simplified claim about that, and then slowly expand. Trying to model the whole world at once will get you nowhere. By slowly expanding from a very simple model of the world that you can prove, you will immediately see what addition causes problems.
  5. Generate counter-examples if you cannot prove it. If you feed assumptions and a goal into a program, and the tool can’t prove it, feed the same thing to a “model checker” to generate a counter-example. (Obviously, this only works if there’s such a tool for your language.) The counter-example will show you an example of why you can’t prove something, and that may be helpful to you.
  6. Remember, you must provide all needed assumptions. A prover can only assume what you give it; if you don’t state it, it cannot assume anything. Some logic languages look like Prolog, but if you’re familiar with Prolog this can be misleading. Prolog presumes “negation as failure”, that is, if it can’t prove something is true, then Prolog assumes it’s false. General theorem provers make no such assumption! On the other hand, theorem provers include an “occurs check”, so the kinds of expressions that get Prolog programs into infinite loops are non-problems for theorem provers.
  7. Strive to have good, consistent names. Using clear names and a consistent naming convention can not only make the final proof clearer, it can also save a lot of time debugging things. A good name can keep you from making some mistakes in the first place, simply because some false claims will be “obviously” untrue. Many systems using naming conventions to distinguish between constants and variables, and some (like prover9) even let you choose between conventions. I find that the Prolog convention (uppercase are variables, lowercase are constants) is simpler to use when modelling the real world compared to some other conventions, because then it’s easier to have clear names. (The prover9 alternative is that terms beginning with u-z are variables, but this can lead to odd names if you’re modelling the real world.)
  8. Check that your assumptions are consistent. In most logic systems, if you baldly assume something inconsistent (such “a equals b” and “a is not equal to b”), then you can prove anything. Thus, once you have your proof, run your assumptions through a model-checker; if the model-checker can find a model, then the assumptions are satisfiable and thus consistent. See my dissertation Fully Countering Trusting Trust through Diverse Double-Compiling for an example of this. Beware: The word "model" has different meanings. In this paper and in casual conversation, the word "model" often means an abstract (and typically mathematical) representation of the real world. Model-checkers use a different technical definition of the word "model"; in model-checkers, a "model" can considered the set of assignments that can can meet a given set of assumptions.
  9. Use a separate verifier. Proof tools can have bugs, too. So, have the proof tool output the proof, and then run it through a separate “verifier” program that checks each step. Verifiers can be much simpler than a program for creating proofs, so they can act as a good test for whether a proof is true or not. Not all tools have verifiers, so you may want to prefer a tool where there’s a separate verifier. For example, prover9 proofs can be checked by ivy.
  10. Have others review the proof. A prover can prove that some claim is true, given the assumptions, but it cannot tell you if the claim or assumptions are adequate models of reality. Others can tell you where they think your model is inadequate for what you are trying to prove, warn you of dubious assumptions (maybe you can break those up into simpler assumptions), and so on.

Choosing a logic language and tool

Unless you want to do things by hand, you’ll need a tool. Different tools implement different logic languages, and of course, the language you use matters greatly. So in practice, you end up selecting both the logic language and the tool simultaneously. (This is a lot like selecting a programming language, really.)

Unfortunately, the current practice is that you need to learn a little bit about several different tools and their languages before you can decide if they’re right for your problem. That’s unfortunate, but probably unavoidable. In particular, tools tend to add little improvements or extensions that may or may not be helpful to you.

My paper High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS) specifically discusses a number of FLOSS formal methods tools that may be useful to you. I recommend that you prefer FLOSS proof tools for these purposes; they cost nothing, aren’t tied to a particular vendor (avoiding vendor lock-in), won’t disappear on you in the future, and so on.

There are several different basic logic languages. Going from weakest to strongest, major types of languages include propositional logic, first-order logic (FOL), modal FOL, and higher-order logic (HOL). ([Schumann2001, page 73] has a slightly more complicated summary about language expressiveness.) Propositional language is too weak for most ordinary users; programs that implement propositional languages (such as SAT solvers) are usually building blocks for creating larger systems instead of being used directly by end-users.

I recommend that if you’re new to this, start by learning first-order logic (FOL), and see if you can make your problem fit inside FOL. The modal logic and higher-order logic (HOL) systems can be very useful, but they are more difficult to automate (thus requiring more human intervention). What’s more, it’s much easier to learn other systems once you know FOL.

In practice, real-world languages have lots of extensions and “built-in theorems”. At the least, almost every tool builds in equality (=), because that’s necessary for lots of real work. Some systems (like prover9) build in list handling, so you can handle arbitrary-length lists; these can be really useful. Many systems build in or add various forms of arithmetic; if you need arithmetic, you need to use such systems instead of trying to add them yourself. The program CVC3 is a good program to use if you need something that includes lots of arithmetic. Other useful programs include HOL, PVS, and (if you’re proving programs) ACL2.

I should warn you that the term “model” has two subtly different meanings. The non-technical term “model” means “a simplified representation of the real world”; in this sense, if you are trying to prove something about the real world, then your assumptions and goal(s) are models of the real world. There’s also a technical meaning of the term “model” that means “an assignment of constants that meet all the given constraints”; this is the kind of model that “model-checkers” find. Both meanings are used, and it can be confusing. Sorry.

You can see a little more about formal methods in general at my formal methods page.

First-Order Logic (FOL)

Many provers are based on “first-order logic” (FOL). If you can express your requirements using some version of FOL, then you can probably use lots of highly-automated tools that actually create the proof (instead of just checking it). Wikipedia’s “First-order logic” entry has some useful information. Some people use different symbols for the same concept, and may add or remove some operations, but once you know one variant it’s trivial to learn another variant.

My dissertation Fully Countering Trusting Trust through Diverse Double-Compiling section 5.2 gives a brief summary of FOL in 5.2. In short, in FOL, every expression is a term or a formula. A term (which denotes an object) is defined as: a variable, a constant, or a function application of form f(...) where each of the zero or more comma-separated parameters is a term. A formula (which denotes a truth value of true or false) is defined as being one of the following (if both A and B are formulas):
-A“not A”. If A is true, then -A is false; if A is false, then -A is true.
A & B“A and B”. Both A and B must be true for the expression to be true
A | B “A or B”. At least one of A or B must be true for the expression to be true.
A -> B “A implies B” or “if A, then B”
A <-> B“A implies B, and B implies A”. This is a short form of “(A -> B) & (B -> A)”
A <- B“B implies A”; this is just a sometimes-useful reversed form of “B -> A”
all X A “for all values of X, A is true”; X is a variable
exists X A “there exists some X where A is true”; X is a variable
T1 = T1 “T1 equals T2”; T1 and T2 must be terms
T1 != T2 “T1 is not equal to T2”
p(...)predicate (a “function” that returns a boolean); each of the 1 or more comma-separated parameters is a term.

For example, you can state “all men are mortal” in FOL as:

  % "For all X, if X is a man, then X is mortal"
  all X man(X) -> mortal(X).
You can express “Socrates is a man” in FOL as:
From these two statements, you can then prove that Socrates is mortal:

There are endless books and pages on FOL; a good library and/or a web search should get you started. “forall x: An introduction to Formal Logic” by P.D. Magnus is an open access introductory textbook in formal logic.

Here are some external pages on how to convert ideas into FOL:

Unfortunately, FOL has some real weaknesses that are easily extended, but many mathematicians don’t seem to realize that they are weaknesses. For example, you can’t define “if-then-else” easily in FOL. FOL has two “types”: formula (essentially booleans) and terms (everything else). You can define a multi-parameter object that returns a boolean (Predicate) or a term (function), but all parameters have to be terms (non-booleans) - and if-then-else inherently takes a boolean! This email notes why it’s important to add if-then-else for practical languages.


Prover9/Mace4 (GPLv2+ license) is a really neat combination of tools. Prover9 can be used to search for a proof, and mace4 can be used to search for a counter-example, which is an interesting combination. Prover9’s language is limited (it’s basically FOL), but if you can express your problem using its language, it’s extraordinarily good at automatically finding proofs or counter-examples. Prover9 includes support for equality and lists, and generally it has a "nice" syntax that's faily close to textbook formats (making it easy to use).

Prover9 is a "theorem prover" that works in a way similar to many other provers. First, it negates the "goal" statement (the idea is that you're trying to prove that there's some contradiction). Then, it tries to generate all possible facts from this set of assumptions (including the negated goal statement), until it finds a contradiction. Obviously, this is not the way that a human would work, but that's okay if you're simply trying to find out if something is true given certain assumptions. This works primarily because computers are remarkably fast, and the developers of prover9 (and other such programs) have honed heuristics that work in many cases.

Mace4 and prover9 include limited support for integer arithmetic. In Mace4 you can turn on arithmetic support with set(arithmetic), and it's fairly capable. Historically, prover9 couldn't handle arithmetic at all, but as of 2008-2009 its "production mode" can do this (use set(production) to turn this on). However, production mode disables many of prover9's other capabilities so this is much more limited (for example, it won't show the rewrites in the output). Many problems don't require calculation, and these limited capabilities may be sufficient for some problems that do. Prover9 has also recently added an if(boolean,if-term,else-term), which is very convenient but again has limitations. If your problem focuses on in-depth arithmetic calculations, prover9 is probably not the tool for you.

Prover9 is easy to invoke; just say “prover9 -f” on the command line, where gives the assumptions and goal to be proved. The prover9 documentation doesn’t describe its input format very clearly; here’s my attempt at documenting the prover9 formula syntax (in BNF).

There’s a paucity of small, simple examples of prover9 in use. It's hard to be simpler than, which is the "Socrates is mortal" proof in prover9 syntax.

Here’s the proof that the square root of 2 is irrational, as prover9 input and prover9 output. (This is a translation of the Otter proof from Larry Wos, by Michael Beeson and William McCune, as published in “The Seventeen Provers of the World” compiled by Freek Wiedijk). Here’s the square root of 2 proof, as a graphic.

Here’s another example,, which defines this predicate:

  distinct([First, Second : Rest]) ->
   ( (First != Second) &
     distinct([First : Rest]) &
     distinct([Second : Rest])).
The “distinct” predicate is useful all by itself, because “distinct” makes it easy to declare that a collection of terms are all different. Basically, “distinct” accepts a list of terms, and then declares that every term is different from all the other terms. For example, you can then state that a, b, c, and d are all different by stating “distinct([a,b,c,d]).”. This is also a useful example of how to do recursive list-handling in prover9.

Note: due to technical issues in prover9, if you include this definition of "distinct" in your assumptions but you never use it, then you may get a misleading message. If prover9 can find a proof but "distinct" is not used, it may output "SEARCH FAILED" followed by "Exiting with 1 proof" (or some other number). In such cases, prover9 did find a proof, but thought it had to keep searching.

Unfortunately, FOL doesn’t have any built-in notion of type, so programs like prover9 that implement FOL don’t have types either. You can implement types in prover9, using FOL; whether or not you should do this depends on your problem. If you add types where they aren’t needed, then the assumptions, goal, and proof may become much more complicated. In my dissertation I decided to discuss types in the informal text, but not use them in the actual proof. A trivial way to implement types is to simply make a predicate that is true if the object is a given type, e.g.:

  % human(X) is true if X is human.
  % penguin(X) is true if X is a penguin.
If there are only two types, then it’s easy to declare that if something is of one type, then it’s not of the other type. Here, we declare that any unknown X cannot be both a human and a penguin:
  -(human(X) & penguin(X)).
File shows how to implement simple types in prover9's first-order logic (FOL)

That doesn’t scale well to many types if you want to make claims about many types. If you have 30 different types, and an item is at most one of those types, it’d be a pain to directly declare that the same way. Thankfully, you can then use recursion (just like the “distinct” predicate above) to declare that instances of different types are themselves distinct. The file shows how to do this.

You can even define that certain types are subclasses of other classes, and so on, but again, you have to build that yourself. The file is a trivial example.

File demonstrates how to implement partitioned superclasses and subclasses (in a hierarchy), instances of types, and various facts about them using in prover9's first-order logic (FOL).

My dissertation Fully Countering Trusting Trust through Diverse Double-Compiling has some additional examples of using prover9.


I can’t really begin to scratch the surface here. I haven’t gone into a vast number of topics, but I hope this little paper has helped.

[Schumann2001] is not a bad introduction to using automated proof tools.


[Schumann2001] Schumann, Johann M. 2001. “Automated Theorem Proving in Software Engineering”. Berlin: Springer-Verlag. ISBN 3-540-67989-8

You might want to look at my Secure Programming HOWTO web page, Open Source Software and Software Assurance (Security), and High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS). The open proofs page has information about open proofs.

You can also view my home page.