Mindspace & Minds' Basis

February 15, 2023

Decision Theory in a Nutshell

The central issue of forecasting is reasoning correctly about probability, which is largely a solved problem, yet very few forecasters really apply consistent reasoning.

The essence of probability and decision theory can be stated in just a page, though there are many additional wrinkles. While long and mathematical, I think some people making very important decisions will find this useful:

Synopsis of Ed Jaynes’ Probability Theory

Probability notation
AB = A and B
A + B = A or B (and/or, not exclusive-or)
a = not A, b = not B
(A|B) = probability of A given B
AA = A;
A(B+C) = AB+AC;
AB+a = ab + B;
D = ab -> (implies) d = A+B

Different chains of reasoning must not disagree; if they do then at least one chain of reasoning is invalid.

The same state of knowledge in different problems must lead to assigning the same probabilities.

Consequently:
1.) (AB|C) = (A|BC)(B|C) = (B|AC)(A|C)
2.) (A|B) + (a|B) = 1 , [ probability = 1 = true ]
3.) (A+B|C) = (A|C) + (B|C) – (AB|C)
4.) If {A_1…A_n} are mutually exclusive and exhaustive index of the possible outcomes, and information B is indifferent and uninformative to predicting the outcome, then:
(A_i|B) = 1/n for i = 1 … n

From rule 1., Bayes’ Theorem:
(A|BC) = (A|C) (B|AC) / (B|C)

From rule 3, if {A_1…A_n} are mutually exclusive :
(A_1 + … +A_n | B) = SUM[ (A_i | B) ]

If the A_i are also exhaustive, then the chain rule is implied:
(B|C) = SUM[ (BA_i | C) ] = SUM[ (B | A_i C) (A_i C) ]

Continuous distributions:
If x is continuously variable, the probability given A, that x lies in the range (x,dx+x) is:
(dx|A) = (x|A)dx
Rule 1 and Bayes’ theorem remain the same, summations become integrations

Prior probabilities:
The initial information is X,
(A|X) is the prior probability of A; use rule 4 when no information, MaxEnt otherwise

Principle of maximum entropy (MaxEnt):
choose the (A_i | X) so as to maximize entropy
H = – SUM[p_i * log[p_i]] given the constraints of X.
For continuous distributions:
H= – ∫ p[x] * log[ p[x]/m[x] ] dx
where the measure m is a weighting or normalizing function which does not change the probabilities given the prior information.

Using new evidence E and Bayes’ theorem gives the posterior probability:
(A|EX), often written (A|E);

Odds O(A|EX) = (A|X)/(a|X) * (E|AX)/(E|aX)
= O(A|X) * (E|AX)/(E|aX)

Decision theory:
Given possible decisions D_1…D_n , loss function L( D_i , θ_j ) which is the loss from choosing D_i when θ_j is the true state of nature; choose D_i that minimizes the expected loss <L_i> = SUM_j [ L(D_i , θ_j) * ( θ_j |EX) ] over the posterior distribution of θ_j .

The above rules apply to inductive inference in general, whether or not a frequency in a random process is involved.

General decision theory:
1. Enumerate the states of nature θ_j, discrete or continuous
2. Assign prior probabilities ( θ_j|X) which maximize the entropy subject to whatever information you have
3. Digest any additional evidence E using Bayes’ theorem to obtain posterior probabilities
(θ_j|EX)
4. Enumerate the possible decisions D_i
5. Specify the loss function L( D_j, θ_j) that tells you what you want to accomplish
6. Make that decision D_j, which minimizes the expected loss
<L_i > = Sum_j [ L( D_i , θ_j)( θ_j|EX) ]

The Kelly Criterion generalizes decision theory to allocating money to maximize expected gains in betting and investment. For details, see Ed Thorp’s paper: "The Kelly Criterion in Blackjack, Sports Betting and the Stock Market" (45pp. PDF)

None of this works unless you use it. A spreadsheet is the easiest way (label everything if you want to understand your calculations later).

April 26, 2022

Here's a corrected version of the graph of intelligence vs. age and IQ from my last post: Rasch measure Intelligence vs Age.pdf [Google Drive, single-page PDF (71kB)]. The standard deviation for age 5 was a bit too low in the earlier version. I updated the PDF file linked there as well.


Woodcock-Johnson IV Absolute General Intellectual Ability by Age and IQ Score		Enon Harris 2022

Here's a guide to how the "W-score" measure of intelligence, which is "absolute", in the sense it can be compared across different ages, converts to adult IQ levels, and to percentile scores for different occupations. This was adapted from this chart on iqcomparisonsite.com

It says on the source site:

This graph was adapted from Figure 12 of Hauser, Robert M. 2002. "Meritocracy, cognitive ability, and the sources of occupational success." CDE Working Paper 98-07 (rev). Center for Demography and Ecology, The University of Wisconsin-Madison, Madison, Wisconsin. The figure is labelled "Wisconsin Men's Henmon-Nelson IQ Distributions for 1992-94 Occupation Groups with 30 Cases or More"

The author of iqcomparisonsite, Rodrigo de la Jara, used it with permission, but I didn't; he has copyright to this image and I don't. A link to a PDF of the referenced paper is given at iqcomparisonsite (see link above).

May 11, 2021

This graph comes from data from the Woodcock-Johnson IV Technical Manual, p. 279-280. (large pdf, link opens in new window.).

The Woodcock-Johnson is one of the top IQ tests, which uses a Rasch measure of intelligence, called a W-score. (The Stanford-Binet uses the same scale, but calls it “change-sensitive scale”, CSS). Rasch measures are absolute measures of ability, in the same way that lengths are absolute measures of distance – they have a true zero, so one can say this question is 10% harder than that, or equivalently, that one person is 10% more intelligent. The difficulties of questions and the abilities of test-takers are computed all together in a matrix with rows being test-takers and columns being questions, with each matrix entry being 1 if that person got that question right, and 0 if they got it wrong. Using matrix math, difficulties and abilities are computed simultaneously. A bootstrapping procedure can update the matrix with additional questions and test results, giving an ongoing update to test norming. After validating questions (a complex but not arbitrary process), the only free choice in making a Rasch measure is choosing a reference score that sets the scale, which for the W-score and CSS was chosen to make the average, 100 IQ 10.0 year-old’s score 500.

I’ve made a graph of full test-scale (FSIQ) W-scores vs. age with additional lines for +/- 1 to 3 standard deviations. This allows comparing the absolute intelligence of people with different ages. So a 145 IQ (+3 s.d.) 8-year old can be seen to be best placed with a 130 IQ class of 10-year-olds, or a 115 IQ class of 13-year-olds.

I’ve found a highly accurate curve fit for W-scores ages 5 to 17:

W-score = 547- 473/age

Above 17, the average CSS rises to 520 by age 22 and remains 520-521 through at least age 35. Generally the W-score standard deviation in adulthood is 10.5. In childhood it falls from about 12 ages 5-8, to 9.5 ages 9-17. Assuming an s.d. of 10 is good enough for most calculations, since the s.d. data is quite noisy.

Update June 5, 2021:

Here is an improved version:

And here is a one-page PDF of it which can be scaled without loss of resolution: Absolute intelligence age distribution chart PDF

April 23, 2016

Physical Units Factor Tables / Large Print PDF

Here's a version of PUFT which has larger text, making it more legible when used as a poster. Unfortunately, to get the text this large meant removing the equations for each unit type, which makes this version a little cryptic at first glance. The length factors of each unit type are indicated by the scales on the top and bottom, the time factors are shown on the left, and the factors common to the units on a table are shown in the colored box to its left. Here's a link to the PDF.

March 28, 2016

Physical Units Factor Tables (PUFT)

Physical Units Factor Tables (PUFT)

Link to full-size PUFT picture

[Edit: Link to PDF]

I drafted the Physical Units Factor Tables (PUFT) a bit over a year ago, and meant to send it off to publishers, as a poster chart for physics classrooms similar to the periodic table in chemistry classes. but I somehow never got around to it.

The Physical Units Factor Tables organize 50 types of physical units by their factors of length, time, mass and charge so that the mathematical relationships between physical units are easy to see.

The Physical Units Factor Tables encourage anyone who can multiply and divide simple fractions to deduce equations in mechanics and electromagnetics .

The single-page document is also marginally legible when printed in color on a single sheet of letter-size paper, but students and teachers with access to computers will likely find the electronic version easier on the eyes.

Each move left represents multiplication by length, each move down is division by time ( = multiplication by frequency). Similarly, moving right represents division by length and each move up represents multiplying by time. These factors are the same in all tables in the stack, with each lower table having an additional factor:

light blue table = * mass
green table       = * 1/charge
pink table       = * mass/charge
purple table      = * mass/charge^2

The names of the unit types are taken from Alan Eliasen's wonderful calculator and physically-typed programming language, Frink. (Except for the ones whose top line is in parentheses; these names aren't listed in Frink, though it can easily compute using such quantities.)

The original was done in an Open Office spreadsheet, then saved as a PDF file. Among several other versions, I also have one that is more legible from a distance for use as a poster.

If any publishers or science teachers are interested in using PUFT, please let me know.

Rasch measure of intelligence age 2-25 +/- 3 s.d. from the norming of the Woodcock-Johnson IQ test block rotation subtest . This is my remake with the scale changed to years, text replaced, and grid added from the source: Kevin McGrew slideshow "Applied Psych Test Design: Part C - Use of Rasch scaling technology" Slide 19 (2009), which had the original caption: Block Rotation: Final Rasch with norming test n = 37 norming items n = 4722 norm subjects Item map with “steps” displayed for items Red area represents the complete range (including extremes) of sample Block Rotation W- scores Good test scale coverage for complete range of population.

Rasch measures of intelligence are an interesting and important part of psychometrics, as they provide an absolute measure of intelligence, not only an "equal interval" scale (as with Fahrenheit and Celsius) but one with with a proper zero (as with Kelvin), also known as a ratio scale (not to be confused with the mental/chronological age ratio used in early IQ tests). Because it is a ratio measure, Rasch measures allow all arithmetic operations ( *,/,+,-, rather than at most + and - for IQ) and form the basis for item response theory (IRT) in general. (See the letter following this post for more.) Rasch measures also have the interesting property of putting item difficulties and test-taker abilities on the same scale, so that a if a person with a certain ability score tries an item with the same difficulty score, then he has a 50% chance of success.

The above graph was adapted from one used in the block rotation subtest norming of the Woodcock-Johnson IQ test (WJ), a product of Riverside Publishing, (a division of Houghton Mifflin Harcourt.) The Stanford-Binet (SB5), also published by Riverside uses the same scale ("change-sensitive" score or scale "CSS"), which has as its only arbitrary choice setting the CSS for an average 10-year old equal to 500.

The paper: Assessment Service Bulletin Number 3: Use of the SB5 in the Assessment of High Abilities, has on page 12 of the PDF (table 4) a reprint from the SB5 interpretive manual of the average full-scale CSS scores for diferent ages, which closely matches the average line in the graph above, so the block rotation subtest average scores vs. age should be a reasonable proxy for the full scale, (though there is reason to think the standard deviations on the WJ block rotation subtest shown in the graph are likely somewhat smaller than for the full scale score of the SB5). (See the end of this post for table 4 in usable form.) Unfortunately Riverside seems reluctant to publish the average age- vs. CSS or W-score graphs for either full test, let alone for different standard deviations, so using the BR subtest as a proxy for the full scale is as well as we can do.

Using a horizontal straightedge on the graph allows equating a given CSS score to z-scores at different ages. ( z-scores = standard deviations, equivalent to 15 IQ points) The Mk.I eyeball gives a pretty decent estimate of fractional z-scores falling between the s.d. lines, but one can use the line or measurement tool in a decent paint program such as Paint.NET or Gimp to get better measurements of the z-score that equates to a given CSS at a given age. (Adding a T-square on a moveable transparent layer is also useful.) - This allows comparing the absolute intelligence of people with different ages and z-scores.