Rasch measure of intelligence age 2-25 +/- 3 s.d. from the norming of the Woodcock-Johnson IQ test block rotation subtest . This is my remake with the scale changed to years, text replaced, and grid added from the source: Kevin McGrew slideshow "Applied Psych Test Design: Part C - Use of Rasch scaling technology" Slide 19 (2009), which had the original caption: Block Rotation: Final Rasch with norming test n = 37 norming items n = 4722 norm subjects Item map with “steps” displayed for items Red area represents the complete range (including extremes) of sample Block Rotation W- scores Good test scale coverage for complete range of population.
Rasch measures of intelligence are an interesting and important part of psychometrics, as they provide an absolute measure of intelligence, not only an "equal interval" scale (as with Fahrenheit and Celsius) but one with with a proper zero (as with Kelvin), also known as a ratio scale (not to be confused with the mental/chronological age ratio used in early IQ tests). Because it is a ratio measure, Rasch measures allow all arithmetic operations ( *,/,+,-, rather than at most + and - for IQ) and form the basis for item response theory (IRT) in general. (See the letter following this post for more.) Rasch measures also have the interesting property of putting item difficulties and test-taker abilities on the same scale, so that a if a person with a certain ability score tries an item with the same difficulty score, then he has a 50% chance of success.
The above graph was adapted from one used in the block rotation subtest norming of the Woodcock-Johnson IQ test (WJ), a product of Riverside Publishing, (a division of Houghton Mifflin Harcourt.) The Stanford-Binet (SB5), also published by Riverside uses the same scale ("change-sensitive" score or scale "CSS"), which has as its only arbitrary choice setting the CSS for an average 10-year old equal to 500.
The paper: Assessment Service Bulletin Number 3: Use of the SB5 in the Assessment of High Abilities, has on page 12 of the PDF (table 4) a reprint from the SB5 interpretive manual of the average full-scale CSS scores for diferent ages, which closely matches the average line in the graph above, so the block rotation subtest average scores vs. age should be a reasonable proxy for the full scale, (though there is reason to think the standard deviations on the WJ block rotation subtest shown in the graph are likely somewhat smaller than for the full scale score of the SB5). (See the end of this post for table 4 in usable form.) Unfortunately Riverside seems reluctant to publish the average age- vs. CSS or W-score graphs for either full test, let alone for different standard deviations, so using the BR subtest as a proxy for the full scale is as well as we can do.
Using a horizontal straightedge on the graph allows equating a given CSS score to z-scores at different ages. ( z-scores = standard deviations, equivalent to 15 IQ points) The Mk.I eyeball gives a pretty decent estimate of fractional z-scores falling between the s.d. lines, but one can use the line or measurement tool in a decent paint program such as Paint.NET or Gimp to get better measurements of the z-score that equates to a given CSS at a given age. (Adding a T-square on a moveable transparent layer is also useful.) - This allows comparing the absolute intelligence of people with different ages and z-scores.
Since division is a valid operation on scores on this scale, one can say that in an absolute sense, the average adult with a score of 510 to 515 is only 2 or 3% more intelligent than the average 10 year-old, and less than 10% smarter than the average 5 year old with a score of 470.
The 25 CSS point difference between +3s.d. adults (~534 CSS) and average adults (509 CSS) is a point larger than the between average adults and average 5 year olds (~483 CSS).
The 512 CSS score of a +3 s.d. 5 year old is about the same as a +0.5 s.d. 22 year-old, which would be about what I would expect the typical graduating psychology major to score. There are many other such comparisons; I have enjoyed hours playing with that chart.
I'd be very interested in finding similar charts for a full-scale test, fluid / crystallized scales or any other sub-tests.
Here is a quote from a letter written in 1999 by the late Prometheus Society member Grady Towers :
There are four levels of measurement generally acknowledged by statisticians. From weakest to strongest, these are nominal, ordinal, interval and ratio. These are important because they determine what kind of statistical procedure can be used. Any statistical procedure using a given level of measurement can be used only on that level. But tests of lesser strength can also be used for the same data. Nominal strength data, for example, can use only tests and procedures appropriate for nominal data. Interval strength data can be tested with interval level tests, but they can also be tested with ordinal level tests and nominal level tests. There's a tradeoff. The lower the level of statistical test used, the fewer assumptions need to be made about the data (normality, symmetry, homoscedasticity, etc), but the larger the sample has to be to reject the null hypothesis.
Nominal scale: numbers are used to name, identify or classify. Telephone numbers are a nominal scale. The correct/incorrect responses used on the items from mental ability tests are also on a nominal scale. Only the statistical techniques based on counting are permitted.
Ordinal scale: numbers represent rank or order. The numbers used to represent the hardness of minerals, from diamond as 10 and talc as 1, represent an ordinal scale. Some people believe that mental abilities represent at most an ordinal scale. Only statistical procedures based on counting, and on greater than or less than are permitted.
Interval scale: intervals between numbers are presumed to be equal. IQ tests are thought to be approximately on an interval scale. They have been described as rubber rulers. Only statistical techniques based on counting, and greater than and less than, and addition and subtraction are permitted.
Ratio scale: all numbers are thought to represent a distance from zero. Weight and distance are ratio scales. All statistical (arithmetic) procedures are permitted, including multiplication and division. This is called the ratio scale because it's permitted to say that one measuremement is twice as large as another. Ten feet is twice as long as five feet. This is not permitted on an interval scale. It is not permitted to say that an IQ of 140 is twice as great as an IQ of 70.Do you get the idea?
*It's also worth browsing other the psychometric and ultrahigh-IQ society miscellanea at Darryl Miyaguchi's archived site.Rasch scores are not rubber rulers! They are on a rigid interval scale. But what is truly apocalyptic about them is that there is a mathematical transformation that will put them on a ratio scale. For the first time in history, it is possible to say that one person is twice as intelligent as another. For the first time in history, it's possible to construct an intelligence scale with amoebas at one end and Jehovah at the other.
SB5 CSS averages (100 IQ) for age: