Probability has always been a bit of a mystery to me. Manipulating basic probabilities according to the axioms of probability is fine. Venn diagrams representing joint distributions is fine. Reasoning about continuous probability is… fine. The calculations and algebra all makes sense, but I never got it. Probability Theory: The Logic of Science by Edward Jaynes starts from the ground up in a way that makes sense to me. Not only because it has so far avoided measure theory and infinite sets, but also because the goal is to design a robot that can reason (I think that’s pretty rad).

# Aristotelian Logic

Jaynes argues for development of a system of probability that acts as an extension of Aristotelian logic rather than infinite set theory.

Aristotelian Logic consists of propositions \(\{A, B, C, .. \}\) and premises ‘if A then B’ which can be acted upon by two arguments:

Major Premise: If A is true then B is true

Minor Premise: A is true

Conclusion: B is true

Or

Major Premise: If A is true then B is true

Minor Premise: B is false

Conclusion: A is false

However, how do we quantify the argument

Major Premise: If A is true then B is true

Minor Premise: B is true

Conclusion: B is more plausible

How much more plausible is B, not that we know A? Real numbers can be used to quantify this, so we need a robot that can reason about propositions (A, B, C, …), and their relative plausibilities:

# Degree of plausibility are represented by real numbers

We would to assign a real number representing the plausibility that a proposition A is true given the knowledge that B is true. This is denoted

$$ A | B $$

And is pronounced ‘A given B’. This is a real number! It is not a probability, it is not in \([0, 1]\): in fact we know nothing about it other than that it is a real number.

# Qualitative correspondence with common sense

We then *choose* to view larger real numbers as representing a higher degree of plausibility. This is an intuitive and natural choice, but is not necessary. We also would like consistency with the rules of Boolean logic: eg. A+B|C represents the plausibility that at least one of A or B is true given C.

As for correspondence with common sense, for example: if C’ represents new information built upon C, then

$$(A|C') > (A|C)$$ we expect $$(AB|C') > (AB|C)$$ so long as $$(B|C') = (B|C)$$

## Consistency

Jaynes posits the following consistency axioms for the robot:

If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.

The robot always takes into account all of the evidence it has relevant to a question. It does not arbitrarily ignore some of the information, basing its conclusions only on what remains. In other words, the robot is completely non-ideological.

The robot always represents equivalent states of knowledge by equivalent plausibility assignments. That is, if in two problems the robot’s state of knowledge is the same (except perhaps for the labeling of the propositions), then it must assign the same plausibilities in both.

And that’s all. The previous desiderata about common sense, and real numbers, along with these consistency guarantees are enough to develop a mathematical theory of plausible reasoning (ie. probability theory). Notice there is no mention of the familiar axioms of probability theory. Although this is a different formulation, Jaynes insists that he (and theory) agree with Kolmogorov’s results without any of the messiness of measure theory and infinite paradoxy.

# What about \(P(x)\)?

Through some relatively straightforward derivation (and references to exhaustive proofs) Jaynes develops the rules of probability that are familiar. For example, we might like to know what (AB|C) is, and the solution involves finding a solution to the equation

$$ (AB|C) = F[(B|C), (A|BC)] $$

This is all in the first two chapters of the book. The rest goes onto develop familiar distributions and techniques using this robot I’m still working through it, but the basic premise is compelling.