The central issue of forecasting is reasoning correctly about probability, which is largely a solved problem, yet very few forecasters really apply consistent reasoning.
The essence of probability and decision theory can be stated in just a page, though there are many additional wrinkles. While long and mathematical, I think some people making very important decisions will find this useful:
Synopsis of Ed Jaynes’ Probability Theory
Probability notation
AB = A and B
A + B = A or B
(and/or, not exclusive-or)
a = not A, b = not B
(A|B) =
probability of A given B
AA = A;
A(B+C) = AB+AC;
AB+a
= ab + B;
D = ab -> (implies) d = A+B
Different chains of reasoning must not disagree; if they do then at least one chain of reasoning is invalid.
The same state of knowledge in different problems must lead to assigning the same probabilities.
Consequently:
1.) (AB|C) = (A|BC)(B|C) = (B|AC)(A|C)
2.)
(A|B) + (a|B) = 1 , [ probability = 1 = true ]
3.) (A+B|C) =
(A|C) + (B|C) – (AB|C)
4.) If {A_1…A_n} are mutually
exclusive and exhaustive index of the possible outcomes, and
information B is indifferent and uninformative to predicting the
outcome, then:
(A_i|B) = 1/n for i = 1 … n
From rule 1., Bayes’ Theorem:
(A|BC) = (A|C) (B|AC)
/ (B|C)
From rule 3, if {A_1…A_n} are mutually exclusive :
(A_1 + …
+A_n | B) = SUM[ (A_i | B) ]
If the A_i are also exhaustive, then the chain rule is
implied:
(B|C) = SUM[ (BA_i | C) ] = SUM[ (B | A_i C) (A_i C) ]
Continuous distributions:
If x is continuously
variable, the probability given A, that x lies in the range (x,dx+x)
is:
(dx|A) = (x|A)dx
Rule 1 and Bayes’ theorem remain the
same, summations become integrations
Prior probabilities:
The initial information is
X,
(A|X) is the prior probability of A; use rule 4 when no
information, MaxEnt otherwise
Principle of maximum entropy (MaxEnt):
choose the (A_i
| X) so as to maximize entropy
H = – SUM[p_i * log[p_i]] given
the constraints of X.
For continuous distributions:
H= –
∫ p[x] * log[ p[x]/m[x] ] dx
where the measure m is a
weighting or normalizing function which does not change the
probabilities given the prior information.
Using new evidence E and Bayes’ theorem gives the posterior
probability:
(A|EX), often written (A|E);
Odds O(A|EX) = (A|X)/(a|X) * (E|AX)/(E|aX)
= O(A|X) *
(E|AX)/(E|aX)
Decision theory:
Given possible decisions D_1…D_n ,
loss function L( D_i , θ_j ) which is the loss from choosing D_i
when θ_j is the true state of nature; choose D_i that minimizes the
expected loss <L_i> = SUM_j [ L(D_i , θ_j) * ( θ_j |EX) ]
over the posterior distribution of θ_j .
The above rules apply to inductive inference in general, whether or not a frequency in a random process is involved.
General decision theory:
1. Enumerate the states of
nature θ_j, discrete or continuous
2. Assign prior
probabilities ( θ_j|X) which maximize the entropy subject to
whatever information you have
3. Digest any additional evidence
E using Bayes’ theorem to obtain posterior
probabilities
(θ_j|EX)
4. Enumerate the possible decisions
D_i
5. Specify the loss function L( D_j, θ_j) that tells you
what you want to accomplish
6. Make that decision D_j, which
minimizes the expected loss
<L_i > = Sum_j [ L( D_i ,
θ_j)( θ_j|EX) ]
The Kelly Criterion generalizes decision theory to allocating
money to maximize expected gains in betting and investment. For
details, see Ed Thorp’s paper: "The Kelly Criterion in Blackjack, Sports Betting and the Stock Market" (45pp. PDF)
None of this works unless you use it. A spreadsheet is the easiest way (label everything if you want to understand your calculations later).