The central issue of forecasting is reasoning correctly about probability, which is largely a solved problem, yet very few forecasters really apply consistent reasoning.

The essence of probability and decision theory can be stated in just a page, though there are many additional wrinkles. While long and mathematical, I think some people making very important decisions will find this useful:

Synopsis of Ed Jaynes’ *Probability Theory*

**Probability notation**

AB = A and B

A + B = A or B
(and/or, not exclusive-or)

a = not A, b = not B

(A|B) =
probability of A given B

AA = A;

A(B+C) = AB+AC;

AB+a
= ab + B;

D = ab -> (implies) d = A+B

Different chains of reasoning must not disagree; if they do then at least one chain of reasoning is invalid.

The same state of knowledge in different problems must lead to assigning the same probabilities.

Consequently:

1.) (AB|C) = (A|BC)(B|C) = (B|AC)(A|C)

2.)
(A|B) + (a|B) = 1 , [ probability = 1 = true ]

3.) (A+B|C) =
(A|C) + (B|C) – (AB|C)

4.) If {A_1…A_n} are mutually
exclusive and exhaustive index of the possible outcomes, and
information B is indifferent and uninformative to predicting the
outcome, then:

(A_i|B) = 1/n for i = 1 … n

From rule 1., **Bayes’ Theorem**:

(A|BC) = (A|C) (B|AC)
/ (B|C)

From rule 3, if {A_1…A_n} are mutually exclusive :

(A_1 + …
+A_n | B) = SUM[ (A_i | B) ]

If the A_i are also exhaustive, then the chain rule is
implied:

(B|C) = SUM[ (BA_i | C) ] = SUM[ (B | A_i C) (A_i C) ]

**Continuous distributions:**

If x is continuously
variable, the probability given A, that x lies in the range (x,dx+x)
is:

(dx|A) = (x|A)dx

Rule 1 and Bayes’ theorem remain the
same, summations become integrations

**Prior probabilities: **

The initial information is
X,

(A|X) is the prior probability of A; use rule 4 when no
information, MaxEnt otherwise

**Principle of maximum entropy (MaxEnt)**:

choose the (A_i
| X) so as to maximize entropy

H = – SUM[p_i * log[p_i]] given
the constraints of X.

For continuous distributions:

H= –
∫ p[x] * log[ p[x]/m[x] ] dx

where the measure m is a
weighting or normalizing function which does not change the
probabilities given the prior information.

Using new evidence E and Bayes’ theorem gives the **posterior
probability**:

(A|EX), often written (A|E);

Odds O(A|EX) = (A|X)/(a|X) * (E|AX)/(E|aX)

= O(A|X) *
(E|AX)/(E|aX)

**Decision theory:**

Given possible decisions D_1…D_n ,
loss function L( D_i , θ_j ) which is the loss from choosing D_i
when θ_j is the true state of nature; choose D_i that minimizes the
expected loss <L_i> = SUM_j [ L(D_i , θ_j) * ( θ_j |EX) ]
over the posterior distribution of θ_j .

The above rules apply to inductive inference in general, whether or not a frequency in a random process is involved.

**General decision theory**:

1. Enumerate the states of
nature θ_j, discrete or continuous

2. Assign prior
probabilities ( θ_j|X) which maximize the entropy subject to
whatever information you have

3. Digest any additional evidence
E using Bayes’ theorem to obtain posterior
probabilities

(θ_j|EX)

4. Enumerate the possible decisions
D_i

5. Specify the loss function L( D_j, θ_j) that tells you
what you want to accomplish

6. Make that decision D_j, which
minimizes the expected loss

<L_i > = Sum_j [ L( D_i ,
θ_j)( θ_j|EX) ]

The Kelly Criterion generalizes decision theory to allocating
money to maximize expected gains in betting and investment. For
details, see Ed Thorp’s paper: "The Kelly Criterion in Blackjack, Sports Betting and the Stock Market" (45pp. PDF)

None of this works unless you use it. A spreadsheet is the easiest way (label everything if you want to understand your calculations later).