Over at WSJ Blog, Ashby Jones contacted Robert Morse to get his reaction to my post about how raters should fill out the US News law school rankings forms:
We caught up with Bob Morse, the director of data services for U.S. News, who said in his estimation, the 1-5 options generally speaking matched up with the level of knowledge held by the raters. “We’ve felt that the level of judgment isn’t granular enough to provide a wider scale.”
He also said that because the survey reports the results of the reputation question out to the tenths place, “we’re actually publishing it on a scale of 50; the results average out to be more granular.”
Morse defends the granularity of the US News rankings by pointing to the fact that the average scores do have decimal points. Although this is true, it doesn’t address the problem I pointed out in my post — for the individuals doing the ratings, they can’t accurately reflect their sense of how a school compares to other schools.
Who gives Yale a 4 on only a 1-5 scale? Or Harvard? Either this person has a very different theory of law school reputation or is trying to game the system.
Let’s say that there are 10 reviewers, and they rate as follows:
Yale scores: 5, 5, 5, 5, 5, 5, 5, 5, 5, 4 = 4.9 average
Harvard scores: 5, 5, 5, 5, 5, 5, 5, 4, 3 = 4.7 average
Morse would conclude that the difference between Yale and Harvard is meaningful. I would conclude that the difference is attributable to either (1) a fluke due to quirky beliefs of a very small number of raters; or (2) gaming by some raters. I just don’t see how, on a 1-5 scale, Yale or Harvard would get any less than a 5 on all the forms. Their averages should both be 5.0. Any differences are the result of flukes or gaming and shouldn’t be taken seriously.
These problems exist beyond Yale and Harvard — they persist for the entire US News survey because there’s not enough granularity. If there’s a granularity problem for individual raters that makes their ratings flawed, then the problem doesn’t just disappear by aggregating flawed ratings.
With all due respect to Morse, I must also disagree with his first point. As my post demonstrated, Morse is wrong in his statement that “the level of judgment isn’t granular enough to provide a wider scale.” If he’s right, then readers of my post should conclude that my hypothetical dean is way outside the norm. But I’m willing to bet that among those in the academy, the consensus on the schools I mentioned in the post is that they deserve different ranking levels: Yale, Michigan, Cornell, USC, Emory, and American. My sense is that there’s a consensus that most raters would rate them in the order listed, and would think that they are all at different reputational levels.
If this is the consensus, then Morse’s 5 point scale isn’t granular enough for individual raters. And aggregating scores that are assigned based on a system of rating that isn’t granular enough doesn’t fix the problem. It just means that the outliers control the outcome, and those outliers are either people with views way outside the norm or people who are gaming the system. I don’t think we want the ratings to turn on what the outliers are doing.
I appreciate Morse’s response to Ashby Jones, and would be interested in his response to my points above.
Please note that I’m not an expert on statistics, so I’m open-minded about my claims. If there’s a statistics expert among our readers, I’d be very interested in your thoughts.
Originally Posted at Concurring Opinions
* * * *
This post was authored by Professor Daniel J. Solove, who through TeachPrivacy develops computer-based privacy training, data security training, HIPAA training, and many other forms of awareness training on privacy and security topics. Professor Solove also posts at his blog at LinkedIn. His blog has more than 1 million followers.