Rasch measures of intelligence are an interesting and important part of psychometrics, as they provide an absolute measure of intelligence, not only an "equal interval" scale (as with Fahrenheit and Celsius) but one with with a proper zero (as with Kelvin), also known as a ratio scale (not to be confused with the mental/chronological age ratio used in early IQ tests). Because it is a ratio measure, Rasch measures allow all arithmetic operations ( *,/,+,-, rather than at most + and - for IQ) and form the basis for item response theory (IRT) in general. (See the letter following this post for more.) Rasch measures also have the interesting property of putting item difficulties and test-taker abilities on the same scale, so that a if a person with a certain ability score tries an item with the same difficulty score, then he has a 50% chance of success.

The above graph was adapted from one used in the block rotation subtest norming of the Woodcock-Johnson IQ test (WJ), a product of Riverside Publishing, (a division of Houghton Mifflin Harcourt.) The Stanford-Binet (SB5), also published by Riverside uses the same scale ("change-sensitive" score or scale "CSS"), which has as its only arbitrary choice setting the CSS for an average 10-year old equal to 500.

The paper: Assessment Service Bulletin Number 3: Use of the SB5 in the Assessment of High Abilities, has on page 12 of the PDF (table 4) a reprint from the SB5 interpretive manual of the average full-scale CSS scores for diferent ages, which closely matches the average line in the graph above, so the block rotation subtest average scores vs. age should be a reasonable proxy for the full scale, (though there is reason to think the standard deviations on the WJ block rotation subtest shown in the graph are likely somewhat smaller than for the full scale score of the SB5). (See the end of this post for table 4 in usable form.) Unfortunately Riverside seems reluctant to publish the average age- vs. CSS or W-score graphs for either full test, let alone for different standard deviations, so using the BR subtest as a proxy for the full scale is as well as we can do.

Using a horizontal straightedge on the graph allows equating a given CSS score to z-scores at different ages. ( z-scores = standard deviations, equivalent to 15 IQ points) The Mk.I eyeball gives a pretty decent estimate of fractional z-scores falling between the s.d. lines, but one can use the line or measurement tool in a decent paint program such as Paint.NET or Gimp to get better measurements of the z-score that equates to a given CSS at a given age. (Adding a T-square on a moveable transparent layer is also useful.) -

**This allows comparing the absolute intelligence of people with different ages and z-scores.**

Since division is a valid operation on scores on this scale, one can say that in an absolute sense, the average adult with a score of 510 to 515 is only 2 or 3% more intelligent than the average 10 year-old, and less than 10% smarter than the average 5 year old with a score of 470.

The 25 CSS point difference between +3s.d. adults (~534 CSS) and average adults (509 CSS) is a point larger than the between average adults and average 5 year olds (~483 CSS).

The 512 CSS score of a +3 s.d. 5 year old is about the same as a +0.5 s.d. 22 year-old, which would be about what I would expect the typical graduating psychology major to score. There are many other such comparisons; I have enjoyed hours playing with that chart.

**I'd be very interested in finding similar charts for a full-scale test, fluid / crystallized scales or any other sub-tests.**

*

Here is a quote from a letter written in 1999 by the late Prometheus Society member Grady Towers :

There are four levels of measurement generally acknowledged by statisticians. From weakest to strongest, these are nominal, ordinal, interval and ratio. These are important because they determine what kind of statistical procedure can be used. Any statistical procedure using a given level of measurement can be used only on that level. But tests of lesser strength can also be used for the same data. Nominal strength data, for example, can use only tests and procedures appropriate for nominal data. Interval strength data can be tested with interval level tests, but they can also be tested with ordinal level tests and nominal level tests. There's a tradeoff. The lower the level of statistical test used, the fewer assumptions need to be made about the data (normality, symmetry, homoscedasticity, etc), but the larger the sample has to be to reject the null hypothesis.

Nominal scale: numbers are used to name, identify or classify. Telephone numbers are a nominal scale. The correct/incorrect responses used on the items from mental ability tests are also on a nominal scale. Only the statistical techniques based on counting are permitted.

Ordinal scale: numbers represent rank or order. The numbers used to represent the hardness of minerals, from diamond as 10 and talc as 1, represent an ordinal scale. Some people believe that mental abilities represent at most an ordinal scale. Only statistical procedures based on counting, and on greater than or less than are permitted.

Interval scale: intervals between numbers are presumed to be equal. IQ tests are thought to be approximately on an interval scale. They have been described as rubber rulers. Only statistical techniques based on counting, and greater than and less than, and addition and subtraction are permitted.

Ratio scale: all numbers are thought to represent a distance from zero. Weight and distance are ratio scales. All statistical (arithmetic) procedures are permitted, including multiplication and division. This is called the ratio scale because it's permitted to say that one measuremement is twice as large as another. Ten feet is twice as long as five feet. This is not permitted on an interval scale. It is not permitted to say that an IQ of 140 is twice as great as an IQ of 70.Do you get the idea?

*It's also worth browsing other the psychometric and ultrahigh-IQ society miscellanea at Darryl Miyaguchi's archived site.Rasch scores are not rubber rulers! They are on a rigid interval scale. But what is truly apocalyptic about them is that there is a mathematical transformation that will put them on a ratio scale. For the first time in history, it is possible to say that one person is twice as intelligent as another. For the first time in history, it's possible to construct an intelligence scale with amoebas at one end and Jehovah at the other.

SB5 CSS averages (100 IQ) for age:

510 | 16.17 | ||

505 | 12.75 | ||

500 | 10.00 | ||

495 | 8.67 | ||

490 | 7.67 | ||

485 | 6.83 | ||

480 | 6.08 | ||

475 | 5.50 | ||

470 | 5.00 | ||

465 | 4.50 | ||

460 | 4.00 | ||

455 | 3.67 | ||

450 | 3.25 | ||

445 | 2.92 | ||

440 | 2.58 | ||

435 | 2.25 |

Thanks for this very useful post.

ReplyDeleteSo what does this mean for age/mental level for getting the rights of a grown-up? Today IQ 70 age 18 is good to go. That's just over 490 on your chart. But if im using Excel correctly it looks like more than 95% of kids are at that level by age 10!

ReplyDeleteMore than 5% are at 70IQ / 18 year old "adult" level by age 5!

ReplyDeleteIm wrong, 5% by age 4 and 16% by age 5!! That cant be right?! no wonder they dont let this s$%! out..

ReplyDeleteYour numbers are about right, but even if we abandon a set age of majority, there are other factors determining competence such as emotional maturity (not that we actually test for that; I'm not impressed with the emotional maturity of 70IQ 18 year olds, for example, or even most college coeds.) There's also a certain level of knowledge that is needed that even the brightest 5 year-olds aren't likely to have. Still, there are many under the age of 18 that have better intelligence, knowledge and wisdom than some enjoying all the rights of adults, and we could come up with better standards than just age for determining majority.

ReplyDelete