« Epigram of the month | Main | More Wagner fun »

Curveball

04 Oct 2007 12:59 pm

This Volokh Conspiracy thread on a student suing over a curved "C" set me wondering: why are curves so common, anyway? Both my schools graded on a curve, which arguably served as a check against grade inflation--but are the incentives for grade inflation really so great that it couldn't be held back by a general agreement among the faculty that a "C" is average?

Moreover, the curve isn't merely for overall performance; it's done on each exam. In some classes, a 55% on an exam can be an A. But why do faculty, particularly at the undergraduate level where the task is mastery of a basic body of knowlege, set exams where the majority of the students can't answer a majority of the questions? Or, conversely, as I've also seen happen, where the difference between an A and a C is a few points, because everyone scored in the high 90's? Is figuring out what your students are likely to know really so hard for an experienced teacher?

TrackBack

TrackBack URL for this entry:
http://meganmcardle.theatlantic.com/mt/mt-tb.cgi/17446

Comments (27)

I agree that grading on a curve is obnoxious. If I know the material why should I be hurt because others in the class do too? Or why should I be helped because everyone was as clueless as I was? Either I know the material, or I don't. Either the professor is capable of determining what I know, or he doesn't. If everyone in the class gets very high or very low grades then he can change his approach for the next exam/assignment.

Having said all that, this lawsuit is absurd.

Grade inflation is a classic collective action problem. Faculty as a group have an incentive to control it, but each professor individually has an incentive to grade high, because higher grades correlate strongly with better student evaluations. Hence curving, to enforce cooperation against grade inflation.

A test in which 55% is an A can be useful, because it disaggregates among those students near the top of the class. A test in which the difference between an A and a C is a few points because everyone scored in the 90s is just badly written, indicating an instructor who's either not very experienced, or not paying that much attention.

Ditto Sam, but also, to what extent are professors concerned with their students doing well in their classes aside from student evaluations? Professors are more concerned with their research and publishing papers than getting good reviews from students, and if your university was anything like mine, it's not like the student opinions really fueled much of the administration's decision to grant tenure.

I don't understand the objections to curves, in general. Even if there was no curve in a particular class, professors would have to adjust the difficulty of their classes so that not everybody gets A's or F's. For almost all courses(even in science) there is no absolute standard of what you need to know, and in the end you'll be compared to everyone else anyway.

That being said, there are lots of possibilities for how you curve something. If you do a separate curve every class that probably motivates students more, but it's obviously less consistent. If you use the same test every year it's more consistent, but it's really probably much less fair, because some people will have old tests. If you change the test every year, then norming the grades between years is probably too difficult for most professors to do. I personally think that making a different curve every semester is the best way to do things, but it depends on your particular goals and type of class.

And if the class is going to get a large curve at the end I think it's a good idea to curve each test. How else is anyone going to know whether their 57% means they're doing great or failing.

"Is figuring out what your students are likely to know really so hard for an experienced teacher?"

Experienced profs don't usually teach first year courses (particularly not the 200 person lecture hall ones.) And judging by the reactions to test results of some profs I've had, yes, they do have a hard time. I recently got back a fluid mechanics exam with an average below 50 (I was the highest grade with a 72 0_o .) It even extends beyond exams. Most times a student asks a question in a class, the prof has difficulty understanding what's unclear to the student. It's often been 20+ years since they first learned the material.

The one advantage to a difficult exam is that they test understanding of the concepts on a deeper level. Exams that most students do well on generally consist of nothing but slightly reworked homework problems. A student can just do a rote repetition and make a decent grade. A problem unlike the ones they've seen forces them to think.

Grading on a curve is an easy way to get the class grades distributed on a bell curve. I suspect that is the main reason why it's done. Of course, any given class probably won't have a bell curve distribution of performance.

In a highly technical class, like science or engineering, you should be able to come up with some sort of standard for what should have been learned and test for that. If everyone learns it, everyone should get an "A". I don't want a civil enginner who got an "A" just because he was the least dumb engineer in a class of morons to design a bridge. I want the civil engineer to get an "A" because he learned the material.

It would be particularly egregious to learn that a statistics professor was grading on a curve...

Sometimes, solutions are chosen because they are easy, not because they are right. Grading on a curve seems to be one of those.

In Law School, the case for grading on a curve might be different... if the point is to foster competition and if it is assumed that everyone is intelligent and capable of learning the material, then grading on a curve may server a useful ranking function as opposed to an indication of having learned anything in particular. I still don't like it, but it's not indefensible.

EI

I use a lot of spot quizzes in my class, partially to evaluate ability, diligence, and learning, but also partially to encourage students to read the material for the class that day, rather than to read only in arrears for the mid-term and final. (I have found that when students have done the required reading I can work at a higher level and they get more out of the class.)

In making up my quizzes I have to vary the questions from semester to semester because of transmition of past quizzes to current students via frats and other associations, and because often a student takes my course on the recommendation of a friend that took it previously. In making up questions I aim for a mean of 70% but find that sometimes what I thought would be an easy question turns out to be difficult, and vice versa; I tend to have a good feel for the difficulty but I do make mistakes. To control for the difficulty, and in order to make sure that if I say something has a weight of X% in the final grade it has that weight, rather than having the weight being a by-product of the variance on the item (quiz, test, or other assignment), I standardize all scores, subtracting the mean on the item and dividing by the standard deviation.

Lastly, as far as curves are concerned, my experience is that markets reward relative ability, not absolute. We get excited about a fund manager that adds a percentage point or so of alpha to their fund, rather than penalizing them for falling so far short of the thousands of percentage points of return they could have made had they had perfect foresight.

Writing tests is hard, and writing tests of consistent difficulty is almost impossible. Some sort of compensation is necessary so student grades reflect their performance more than fluctuations in test difficulty.

Then again, at my college, "grading on a curve" meant adding 10-20 percentage points to everyone's score, rather than rigidly forcing everything into a normal distribution.

I am shocked to hear of a teacher curving ¨down¨. I will do an end of quarter ¨curve¨, but only if the class average is a little low, so as to give the students a bump of a few points. If I curved down, there would be rioting.

I also saw his class was one on social theory, which I imagine means the grading mechanism is papers and essays. Grading these is always a pretty subjective process. If the class average is too high, its because this prof. is too lenient of a grader. Its best to be a hard ass grader, then give your students a few points at the end. Makes you look like a real nice guy...and less likely to get sued.

As someone on the writing and grading end of exams these days (mostly freshman level calculus), the answer mostly comes down to the fact that yes, writing exams is hard, and in particular, it's something we get no actual training in. Questions we think should be easy can turn out to be extremely hard for any number of reasons (to give one example, if I mention in passing right before an exam that X is important, and the first question on the exam is about Y, which is superficially, but only superficially, like X, that throw away comment may significantly change the number of students who wrongly treat Y as if it were X).

As for all the complaints about getting punished or rewarded by being in a particularly good or bad class; first, the statistical variation probably isn't as big as you think it is. Second, we can actually identify the difference between a better class and a worse one, and (at least at the mid-sized research institutions I've been at) set the curve accordingly. If a particular class really is better than average, the curve will get pegged higher than average, too, to compensate.

Most people aren't using curves because they're trying to assess the students against each other. The point of a curve is to assess the tests, by using the a priori estimate of what the overall student performance should look like, and then use the estimated difficulty of the test to grade the students.

Finally, to answer Megan's specific questions:
1) Sometimes an exam averages at 55% because the professor overreached with the material. And some professors just like leaving a lot of room for the best students to shine. (When I think there are a couple of particularly talented students in a class, I'm tempted to put one or two really interesting, but hard, problems on a test to encourage them, and just use the curve to make sure the rest of he class doesn't get hurt for it.)

2) If the difference between an A and a C is only a few points crammed in the high 80s, it's almost certainly because the exam was too easy.

3) I don't know anyone who likes curving each test; most people would rather just curve the overall score, and some do. However when a professor says that there will be an overall curve but doesn't say what scores on each exam correspond to, they will be hounded incessently by students begging to know what their numerical score represents (even if detailed information about the distribution of scores is provided). Most people just give up and start curving each exam.

I've had at least a couple professors justify aiming for a median score of about 50% to get a normal distribution (which is useful-- if they're all stacked around 90%, it's hard to figure out which students really understand the material and which only have a superficial understanding). This does foster competition, as others have mentioned, but it also does students the service of removing the teacher from the equation somewhat... if a teacher isn't very good and grades absolutely, they're punished for the teacher's shortcomings. While it's true that a student is similarly punished when being graded on a curve for being in a class with a lot of high-performing students, it's more likely that one teacher is bad than thirty students are all particularly good... so in general, grading on a curve is more generous to the students, if done properly.

That said, as Henry mentioned, "[i]f the difference between an A and a C is only a few points crammed in the high 80s, it's almost certainly because the exam was too easy." Grading downwards after writing an overly-easy exam is trash, from a statistical perspective, because you don't have the necessary information to make statistically significant differentiations between students. This is obvious intuitively, as with a curve like that, a single random error could easily mean a full letter grade difference between two otherwise identical students.

My HS trig teacher would list the class's scores on the chalkboard in descending order before returning the tests. Clusters of students were usually obvious. "These guys obviously learned the material, these ones made some mistakes, this group needs to study a little more. Please see me after class if you're down here." He was an experienced teacher so the A B C D lines were usually pretty close to the standard numerical cutoffs.

I liked it -- you knew where you stood in class, if an exam was particularly tough you weren't punished too hard, and it added a little excitement to trig.

Another place where low top scores can be a well-constructed test is in a situation where there is more info than anyone is expected to know. Thsi isn't uncommon in Chemistry, Biology, the actuarial exams, and such areas. Depending on your area of strength, you should know 2/3 of this material cold--but which 2/3 varies from student to student.

Henry's last three points pretty much sum it up nicely for me.

Ditto Sam, but also, to what extent are professors concerned with their students doing well in their classes aside from student evaluations? Professors are more concerned with their research and publishing papers than getting good reviews from students, and if your university was anything like mine, it's not like the student opinions really fueled much of the administration's decision to grant tenure.

They can matter a bit at the margins, but one thing that I wouldn't discount (and what frankly surprised me when I started teaching undergrads) is the sheer amount of harassment that you can get from undergrads who didn't get a good grade, even to the point where some will have their parents send threatening letters to the dean. If you're teaching a large class, it's so much easier to curve students' grades up from a D to a C than have 20 students at the end of the semester in your office crying about how you ruined their lives, or how they got A's in all of their other classes, so it's your fault if they got a B in yours.

I've always thought that curving tests makes it hard to determine how well the teachers are teaching.

The point of technical classes should be to teach a set of skills, facts or abilities. Extraordinarily difficult questions should be extra credit.

Professors should write the tests for other teachers to discourage grade inflation (hopefully.)

My job as a college teacher is to teach my students to the best of my ability, which to me means getting as many of my students as I can to do well. My working assumption, the only one that makes sense to me as a teacher, is that any student who actually does the work should succeed unless I've failed to find a way to teach that particular student. Since a failure on my part is paid for by the student (with a bad grade that also represents a waste of money, effort and time) I feel obligated to think that way. And since I teach at a selective Liberal Arts college (where, having no grad students we teach our own classes) I usually teach students who are highly motivated, intelligent and commonly educated well enough at the secondary level that no one who commits to doing the work in my classes should fail. And no one who makes that commitment does. They're that good, so am I.

My job is teaching, not grading. While grading is something that can serve an educational purpose (meaning that it can be of value to students) if it is designed and implemented well enough, it is often nothing more than a sorting mechanism that provides a value to end users of the mechanism, usually graduate and professional schools and some, though by no means all employers. Grading on a curve is simply one way of constructing the mechanism and in itself has no more intrinsic educational value than does the number of grades one can give (five in my elementary school, three in grad school, eleven where I teach now).

Meaningful academic evaluations are far more detailed, instructive and time consuming than assigning a letter or number grade. The fact that we've constructed an educational system that makes meaningful evaluations difficult if not impossible does not make the grading systems we do use useful or even logical. When we evaluate students effectively the grade itself is the least important thing we communicate.

Whenever I think about grading I'm struck by the fact that most of the valuable things I've learned and continue to learn in the field I teach in, I learned outside of school. That has something to do with the commitment my private teachers have had to my success, a commitment that wasn't affected by the need to sort their students into categories, and a lot to do with my career experience which has been conducted with a concern for its success certainly, but with no concern at all for my GPA.

'am shocked to hear of a teacher curving ¨down¨.'

Me too. It is much easier to make the exams too hard, and curve up. Nobody likes to be curved down. It isn't even a question of knowing all of the material. You can make the tests arbitrarily long. Those who know the material perfectly might not finish. Those who know it less well will complete fewer problems.

I know in grad school, professors had other motives. They'd test to see if you knew the material, but they always laid some killer problems on us. They wanted to see what you could do when pushed. They were looking for students with whom they would want to work or publish.

Sam - I agree with you that there are pressures on individual professors to grade high, but I disagree with what those pressures are. The pressure to give out good grades is not, in my experience, an attempt by indivdual professors to garner positive reviews from their students. In general, such reviews aren't taken all that seriously by tenure committees.

There is, however, a powerful disincentive to hand out poor grades. The more poor grades you hand out, the greater the chances a student will contest one of the grades you give out, a circumstance made more likely by the increasingly mercenary attitude students take toward college. But having a student contest a grade is a pain. Depending on the institution, it may involve having to write an extended defense of the grading, or a nearly endless series of meetings with the student, the student's faculty advisor, the dean of students, your department chair and who knows how many other college administrators. It's a huge hassle, and for what? To defend a grade you gave to one student in one class who, regardless of what grade he earned, will almost certainly have forgotten 90 percent of what was taught within a year. No, thanks. I'll worry about grading integrity once I have tenure.

N.

This discussion reminds me of the time I graded a midterm exam for an undergraduate organic synthesis class given by one of my professors at Northwestern in the early 90s. The exam was so difficult that nearly every test I graded missed every question- most missed by miles. Without a curve that day, everyone would have failed, and even with a standard curve, most would have failed. We ended up giving Cs and Ds to people who scored between zero and ten points.

Some schools, particularly some small colleges that emphasize teaching over research, place a great deal of importance on student evaluations at tenure time. While the administration at my school is reluctant to quantify it, it is generally understood to be fifty percent of the equation. And students evaluate professors in other ways, by choosing whether or not to take their classes for example. I teach in the arts and most of my students are choosing to take my classes out of interest and/or to fulfill an arts requirement in the core curriculum. If the average grade in my classes were a "C" I can guarantee you that my classes would never fill no matter how well I taught the content. And when they didn't fill my bosses would take that as an indication that something is very wrong with my teaching.

One of the most bizarre things about the degree to which colleges worry about grading their students is how little emphasis is placed on grades when faculty hiring decisions are made. I've been on ten search committees for tenure track faculty at two very different schools and the most important thing we've looked at when it came to a candidate's degrees is where they came from. While we've asked for sealed, official undergraduate and graduate transcripts when we've reached interview stage, that's done simply to avoid fraud. I've never known the undergrad or graduate GPA of a candidate even at the offer stage and I never expect to.

As for grade inflation, it's a non-problem. If you want to find out who the best students are and you're using GPA's to determine that, look for the better students on the right side of the distribution curve and the poorer ones on the left. That will be true no matter where the fat part of the curve is. Better yet, use class rankings. It's simpler.

One of the things that I had hoped for with your move to the Atlantic was that trackbacks would work.

Can't call method "path_info" on an undefined value at lib/MT/App.pm line 1342.

Oh well.

Here's a response to

But why do faculty, particularly at the undergraduate level where the task is mastery of a basic body of knowlege, set exams where the majority of the students can't answer a majority of the questions?

My background as an undergraduate is in physics and mathematics, so most exams involved being able to solve various problems.

If the exams are made easy, then you primarily find yourself measuring how fastidious the students are. Pretty much everyone can solve the problems, so what distinguishes students is their ability to avoid 'careless errors'. Small mistakes of accounting that can lead to loss of points for what is effectively otherwise an acceptable approach.

If you make exams very hard, then you primarily find yourself measuring how deeply the students understand the material. Only students with a deep grasp can even begin to answer all the questions, and any points lost to a lack of fastidiousness are small compared to points lost due to failure to have grasped the material sufficiently to solve harder problems.

In mathematics and the hard sciences, I would argue that harder tests measure what you are really interested in: a deep grasp of the material. They simply have more resolving power (ie, the ability to separate students by understanding of the material). *That* is why (at least in technical fields) it's acceptable to give tests that when curved put an A at 55%.

Please note, the above can be taken to extremes. I am aware of at least one course in electricity and magnetism suffered through by those 3 years ahead of me where anything over 10% was an A. That's probably taking it a bit to far...

One thing that confuses the discussion is what is meant by "curve."

One meaning is to force the grades onto a bell curve distribution of letter grades... for example, 2 A's, 4 B's, 10 C's, 4 D's, and 2 F's. In this system, no matter how well the students do, 2 will fail. I strongly object to this sort of curve.

The other meaning is to slide the scale up or down so that the best or worst grade is in some range. For example, if the grades range from 55 to 12, you might set the threshold for an A at 50. In this scheme, everyone could still get an A or a B or you could have no student fail if they performed closely. This system makes more sense, especially if you assume that your top student(s) will generally be worthy of an A.

Ideally, though, the test is scaled properly, but that's impossible to do every time.

EI

I remember a test in Thermodynamics. I got part of one of the three questions right. My 14 out of 100 was a C. The top score was a 40...

Of course, with the Vietnam War and the draft, the stakes were potentially higher than bruised egos.

How tests are written and graded depends on what the results are used for. I've seen two diametrically opposed philosophies that were well justified - for the field that was covered.

1) As an Air Force electronics technician, working on an airplane that men would climb into and fly aggressively, anything less than 100% correct would get men killed. Of course, perfection was too high a goal to realistically require in training. IIRC, the pass/fail level in tech school was 85%, on tests that were "easy" in the sense that there were no questions that required more than hard studying and a 100 IQ to get right, but that were quite comprehensive. In my class of 6, everyone made some 100% tests, and we helped each other study until everybody passed each test. 4 scored in the 90's overall, and I missed the top spot by 0.1% - so one moment of misunderstanding, or maybe even a stray pencil mark, made the difference between #1 and #2. School was followed by a long period of on the job training, from which we emerged almost perfect - and then by checking each other's work, we finally achieved results that were very close to perfection.

Just one person washed out during on the job training - the one who beat me to the top score. For a tech it's not enough to know everything, you also need the manual dexterity to get the work done.

2) My high school Honors Chemistry teacher tried to write tests with a wide variety of difficulty in the questions, so anyone who'd paid attention in class at all could get some right, and yet the best students would struggle with some questions. He thought he had failed if anyone scored 100%. He'd say that if I could score 100%, "The test didn't measure what you can do." Then he curved the results, setting the mean = C and calculating standard deviation (or some other measure of spread) to determine the size of each grade box.

So his main goal was to accurately distinguish the best from the good, and his tests did that quite well. He was teaching a science - a field where what is most interesting isn't what is known, but what isn't yet known. I think his tests did as much as can possibly be done with high school sophomores at picking out those that might be good enough to explore those unknowns.

He did not have a standard for how much you should know to pass, but if you were too far down at the tail end of the distribution, you'd fail and have two years to re-take and pass the regular version of the class. The C and lower students weren't going to be chemists unless they had talents they hadn't been using, so the failing end of the curve was there mainly to encourage everyone to do the best they could. Studying until you gain a passing understanding of something that you are *not* good at is also an essential life skill.

We're at reiteration stage now, but: ETS and similar outfits spend massive amounts of money and have hundreds of employees to generate new tests every year that successfully distinguish between levels of knowledge, aren't too repetitive with past tests, but generate results that are fairly comparable with them. Writing a good test is hard. ETS can test their instruments over and over again before using them. We only have the ability to adjust after the fact.

Post a comment

By using this service you agree not to post material that is obscene, harassing, defamatory, or otherwise objectionable. Although The Atlantic does not monitor comments posted to this site (and has no obligation to), it reserves the right to delete, edit, or move any material that it deems to be in violation of this rule.


Copyright © 2008 by The Atlantic Monthly Group. All rights reserved.