I know much has been written about the Ricci v. DeStafano case, both pro and con. In this post, though, I want to focus on the test. Full disclosure: I'm a test skeptic. I think test results are overvalued for most jobs, I think most tests probably are inherently biased toward certain privileged members of society, and I think the overvaluation of tests has a discriminatory impact on marginalized groups. But, those (big) questions aside, has anyone raised the question of whether New Haven's test for firefighter captain really accurately predicts who will be a good captain? I confess I don't really know the details of either the test or the qualifications of the job. But I have some serious skepticism that a written test can really evaluate whether you will be a good leader of an emergency response team.
In my readings on how people make decisions (yes, I know, I haven't shut up yet), I learned that the NFL requires every quarterback in the draft to take an intelligence test (called the Wonderlic). This is the NFL's way of trying to gauge how good someone will be at making the kind of decisions quarterbacks have to make. The problem is that the test is wildly inaccurate for that purpose. Some of the most successful quarterbacks (Brett Favre, Terry Bradshaw, Dan Marino) had woeful Wonderlic scores, well below average. And some of the most forgettable quarterbacks (you've probably never heard of them), scored quite high. As it turns out, filling in bubbles in a written test has almost nothing to do with making decisions "in the pocket." It potentially weeds out great players, and elevates mediocre ones. The best way to tell whether someone will be good at something is testing their performance during a simulation of the actual job. Not a surprise, I'm sure, but in that case, how can written tests continue to be justified for jobs that do not involve that kind of decision-making?
I recognize that this question has broader implications -- including for legal education. Do law school exams really predict how lawyers will perform "in the pocket"? I am a skills teacher, so maybe I'm a little biased. But now, as law firms consider "apprenticeships" and other new models for legal hiring, I wonder whether it is time to rethink the centrality of the test.
No, I think you make a great point. Some people are horrendous test-takers because of the pressure and time constraints. Those same people may really know their stuff in the field, but do horribly in actual tests.
Posted by: Joe | July 01, 2009 at 10:59 AM
While I'm sure the New Haven test was not perfect, I don't think your comparison to the Wonderlic is fair. I believe that the Wonderlic test is given to all NFL prospects, not just QB's. And the Wonderlic is really a general intelligence test, not a test designed to "gauge how good someone will be at making the kind of decisions quarterbacks have to make." For example, I do not think it tests you on hypothetical situations you would face in the pocket as a QB, and ask you whether you would scramble or pass to receiver A, B, or C. So you would not necessarily expect a high correlation between Wonderlic scores and QB on-field performance. While I have not studied all the details of the New Haven written test, according to the majority opinion it was designed to test "job-related knowledge," and developed from source materials including training manuals, etc. In addition, the oral exam (weighted 40%) included hypothetical job situations and asked the individual how he or she would respond. Thus, while success on the New Haven exam almost certainly does not perfectly correlate with successful performance as an officer, it is probably significantly more likely to accurately predict job performance than the Wonderlic test would be for NFL QB's.
Posted by: Jason Bent | July 01, 2009 at 11:10 AM
Fair enough. And you are absolutely right that all players have to take the Wonderlic (though my understanding, admittedly minimal, is that particular attention is paid to how quarterbacks score, because of the premium placed on the quarterback's ability to make decisions and memorize plays etc).
But putting the flaws in the analogy aside, what about my ultimate point about the assumptions underlying "chalk and talk" type tests -- that if a person scores well on a memorization test, or one that tests cognitive ability, they will perform well in the field (cf. bar exam)? This is where I see the connection between the Wonderlic and the NHFD test (and perhaps law school exams). I understand that the NHFD test was more elaborate than the Wonderlic (as are law school exams), but I think they are based on a similar assumptions.
Posted by: Kathy Stanchi | July 01, 2009 at 11:56 AM
Sorry to take the post a bit away from its important and interesting topic, but I find it quite interesting that, apparently, offensive linemen (especially centers and tackles) do the best on average on the Wonderlic test. I expect that this goes strongly against the stereotype for the brains of various players.
Posted by: Matt | July 01, 2009 at 05:20 PM
I understand your larger point, and I think it is a fair one. Certain tests are undoubtedly bad predictors of their target performance measure, and maybe that is especially true for memorization tests. But each test needs to be considered on its own and in context. How accurately does it predict its target performance? Does memorization or knowledge of the items tested actually matter in the job? There are cost/benefit tradeoffs to developing and administering a more perfect test to a large group, especially if it is heavily simulation-based. Who should decide whether the test is a "good enough" predictor to use to draw distinctions between candidates? The employer? The courts?
Posted by: Jason Bent | July 01, 2009 at 06:05 PM
How about the best "test" of all--that required by those in legal academia?
Please answer yes or no:
1) Did you attend a top law school?
2) Did you graduate at or near the top of your class?
3) Clerkship?
4) Publication in a top law review?
If you answer yes to 3 out of 4 you've clearly got what it takes to be a [great] law professor. If not, you're out of luck.
Posted by: Anon | July 02, 2009 at 10:49 AM
Interesting points, all. Thanks.
I guess at the root of this is the problem I purported to sidestep but really didn't. In the NHFD case, I am deeply troubled by the intersecting problem of (i) a test that may evaluate skills different from those required by the job and (ii) the clear disparate impact of the test on certain disadvantaged groups. I just can't believe that so many more white firefighters are so much more qualified to be captains than African American firefighters. The starkness of the disparity in test results suggests to me a (serious) flaw in the test. That disparity should, in my view, significantly impact the cost/benefit analysis of finding a better test. The question of who decides is a good one and very tricky, especially here where the employer thought it was a great test until they saw the results, and I guess we all know what the Court thought.
And, yes, I agree, Anon, this intersection (bad test/disparity) is reproduced in so many other areas of education and employment, including law school exams, the Bar exam, and law professor hiring. (as for offensive linemen, I'm sure I'd have an opinion if I knew anything about what they do!)
Posted by: Kathy Stanchi | July 02, 2009 at 12:44 PM