I was thinking about something only tangentially related to grading, when it struck me that the way we go about generating student grade point averages is the kind of mind-bogglingly stupid system that requires lots of smart people working together to produce. Two very different groups of smart people, with very different ways of looking at the world.
As a scientist, the starting point for assigning grades is generally a set of scores on a bunch of individual assessments. These are generally combined to form some sort of weighted average, which can be expressed as something like a percentage of the total possible points earned. This percentage is then converted to a letter grade (possibly with letter-plus and letter-minus steps, depending on the institution). Then those letter grades are converted back to numbers based on the four-point scale, and those numbers are averaged to produce an overall GPA. Which is reported to three decimal places so we can rank-order students.
But from a signal-processing sort of standpoint, this is remarkably stupid. I have what is basically a continuous analog signal in the initial percentage grade, which is then crudely digitized into a letter grade with a limited set of discrete steps, and then converted back to an analog signal by averaging a bunch of letter grades. The middle step of converting to a discrete letter scale then converting back is pointless at best, and probably introduces extra noise. You’d be much better off averaging together the original percentage scores.
But, of course, I say that because I’m a scientist, and my classes tend to involve lots of grades that are easily rendered into a roughly continuous numerical format. The letter-grade scale is a foolish and clumsy add-on to this. Faculty in “the humanities,” though, and others teaching classes where the vast majority of the final grade is determined from a single paper are much better served by the cruder letter-grade scale– they’re starting with grades that naturally fall into a smaller number of reasonably discrete categories.
But, of course, we have to do something to summarize the performance of students in a compact manner, and employers and graduate schools want rank-ordered lists of class standing, and the false precision of extra decimal places has a seductive allure. so we’re stuck with this foolish system of crude and pointless intermediate discretization…
On the other hand, the cutoff for an A is not the same in every course, particularly where partial credit is involved in numerical problems. If 85% is an A in an intro science class, then calculating the GPA from letter grades would yield a higher number than using straight percentages.
People tend to object to changes that would lower their metrics, so alas…
I agree with abought – the transformation from percentages to letter grades can be thought of as a normalization and smoothing process – you’re not losing information, you’re removing noise and signal bias.
That said, calculating GPAs down to three decimal places is absurd. Even the second place is probably pushing it. (A 0.01 GPA difference is equal to a C versus a B in just 1 out of 100 classes, at which point you’re well into random variability.)
The whole issue is the compulsive need to exhaustively rank students. What’s wrong with saying that two students are about equal academically? If you really wanted to completely rank students, you should probably have some sort of head-to-head tournament system. “May Madness”, perhaps, and the professors can fill out brackets as an office pool. (100 points for correctly predicting the valedictorian!)
Binning reduces the systematic effects of instructor and discipline variance. If you get rid of it, I think you’ll find that grading is dominated by systematics.
It is even dumber that we generate weighted averages for the internal grade for each class, but the overall GPA is only crudely weighted by the number of units assigned to each class, leading to gaming of GPAs by students.
When I was in high school (in Saskatchewan), and in college (at the University of Saskatchewan), my transcripts reported a two-digit percentage for each class, and my GPA was calculated as the average of those percentages. So that does sometimes happen.
There’s a secondary issue, and that’s the loss of information in reporting just a single number (or letter) for each course. That made sense when it was expensive to record and report anything more, but today data storage and transmission are dirt cheap. I’ve heard of experiments in recording more, but it seems academia has a lot of inertia.
It’s a whole stack of dumb. quantize, then aggregate, then quantize is absolutely silly (see http://en.wikipedia.org/wiki/Discursive_dilemma for a way to explain this to non-techies). but wanting an exact ranking of students is demanding more precision than exists.
If you built a car this way, it’d fall apart before anyone sat in it. It’s lucky for society that nobody cares after a few years out of school anyway.
gaming of GPAs by students
This is already a problem in high school, or at least it was at the high school I attended in the 1980s. There, GPAs were calculated by including bonus points (1 for each honors class, 2 for each AP class). At my high school, it led to a disproportionate interest in AP European History, which had a reputation as an easy class. It also had more insidious effects, such as the classmate of mine who had a reputation for acquiring the best grades money could buy.
Not that the alternatives would solve the problem. Any system can be gamed, and will be once people figure out how. Especially when the competition becomes high-stakes.
Yeah, gaming is a problem in any system that can be gamed, pretty much. It also goes the other way– I was the valedictorian of my high school class, but mostly thanks to the honors-class bonus. The woman who was salutatorian had slightly better grades on a strict numerical scale (we used numbers, not letters), but didn’t take as many hard classes (I took four years of both Latin and French, and two AP classes in which I was the only student). I would’ve been a little peeved if she had edged me out…
There are some universities – the state universities in Washington come to mind – that give individual grades to two significant figures. But that gets you wondering if you can _really_ tell the difference between a 3.2 student and a 3.3 student.
The stupidest thing here is actually averaging the grades at all. In one class the professor might make 2.6 the average grade, and in another class it might be 3.6. These are not quantities you should average. If the student had thousands of professors you could average over these things, maybe (although there are systematics depending on university, major, and year), but it’s possible a student could have as few as three professors for an entire year (3 year-long sequences).
I would suggest that Evergreen State and New College have the right idea with narrative evaluations rather than letter grades. Human ability and performance have too many dimensions to appropriately combine into a single metric, and too many students spend their time worrying about the grade rather than the learning. From my point of view, the entire system of grading is broken, not only the GPA.
Since a grade represents nothing more than one professor’s opinion as to how any student did compared with others in the same class, there’s a very simple solution to the many problems of grading discussed in the article and in the comments.
Schools should report how well a student did in each class compared with the mean instead of – or at least in addition to – a letter or numerical grade. This can best be done – and done with more precision – by using numerical scoring rather than letter grades and their equivalents.
A so-called Z-score of 0.0 shows that the student’s performance was equal to the mean and therefore just average, whereas a Z-score of 1.0 shows performance one standard deviation above the mean (about the top 16 percent of the class), etc.
The Z-score from each class can be averaged, weighted by the number of credits for each course if desired, just as numerical grades (and/or the numerical equivalents of letter grades) can be averaged to produce something like an overall GPA.
If schools do not report this information voluntarily, perhaps employers should insist upon it to rein in the arms race in grades called grade inflation, and to make the reported grades for more meaningful.