Mindspace & Minds' Basis: February 2023

The central issue of forecasting is reasoning correctly about probability, which is largely a solved problem, yet very few forecasters really apply consistent reasoning.

The essence of probability and decision theory can be stated in just a page, though there are many additional wrinkles. While long and mathematical, I think some people making very important decisions will find this useful:

Synopsis of Ed Jaynes’ Probability Theory

Probability notation
AB = A and B
A + B = A or B (and/or, not exclusive-or)
a = not A, b = not B
(A|B) = probability of A given B
AA = A;
A(B+C) = AB+AC;
AB+a = ab + B;
D = ab -> (implies) d = A+B

Different chains of reasoning must not disagree; if they do then at least one chain of reasoning is invalid.

The same state of knowledge in different problems must lead to assigning the same probabilities.

Consequently:
1.) (AB|C) = (A|BC)(B|C) = (B|AC)(A|C)
2.) (A|B) + (a|B) = 1 , [ probability = 1 = true ]
3.) (A+B|C) = (A|C) + (B|C) – (AB|C)
4.) If {A_1…A_n} are mutually exclusive and exhaustive index of the possible outcomes, and information B is indifferent and uninformative to predicting the outcome, then:
(A_i|B) = 1/n for i = 1 … n

From rule 1., Bayes’ Theorem:
(A|BC) = (A|C) (B|AC) / (B|C)

From rule 3, if {A_1…A_n} are mutually exclusive :
(A_1 + … +A_n | B) = SUM[ (A_i | B) ]

If the A_i are also exhaustive, then the chain rule is implied:
(B|C) = SUM[ (BA_i | C) ] = SUM[ (B | A_i C) (A_i C) ]

Continuous distributions:
If x is continuously variable, the probability given A, that x lies in the range (x,dx+x) is:
(dx|A) = (x|A)dx
Rule 1 and Bayes’ theorem remain the same, summations become integrations

Prior probabilities:
The initial information is X,
(A|X) is the prior probability of A; use rule 4 when no information, MaxEnt otherwise

Principle of maximum entropy (MaxEnt):
choose the (A_i | X) so as to maximize entropy
H = – SUM[p_i * log[p_i]] given the constraints of X.
For continuous distributions:
H= – ∫ p[x] * log[ p[x]/m[x] ] dx
where the measure m is a weighting or normalizing function which does not change the probabilities given the prior information.

Using new evidence E and Bayes’ theorem gives the posterior probability:
(A|EX), often written (A|E);

Odds O(A|EX) = (A|X)/(a|X) * (E|AX)/(E|aX)
= O(A|X) * (E|AX)/(E|aX)

Decision theory:
Given possible decisions D_1…D_n , loss function L( D_i , θ_j ) which is the loss from choosing D_i when θ_j is the true state of nature; choose D_i that minimizes the expected loss <L_i> = SUM_j [ L(D_i , θ_j) * ( θ_j |EX) ] over the posterior distribution of θ_j .

The above rules apply to inductive inference in general, whether or not a frequency in a random process is involved.

General decision theory:
1. Enumerate the states of nature θ_j, discrete or continuous
2. Assign prior probabilities ( θ_j|X) which maximize the entropy subject to whatever information you have
3. Digest any additional evidence E using Bayes’ theorem to obtain posterior probabilities
(θ_j|EX)
4. Enumerate the possible decisions D_i
5. Specify the loss function L( D_j, θ_j) that tells you what you want to accomplish
6. Make that decision D_j, which minimizes the expected loss
<L_i > = Sum_j [ L( D_i , θ_j)( θ_j|EX) ]

The Kelly Criterion generalizes decision theory to allocating money to maximize expected gains in betting and investment. For details, see Ed Thorp’s paper: "The Kelly Criterion in Blackjack, Sports Betting and the Stock Market" (45pp. PDF)

None of this works unless you use it. A spreadsheet is the easiest way (label everything if you want to understand your calculations later).

Mindspace & Minds' Basis

February 15, 2023

Decision Theory in a Nutshell