Teaching Evaluations and the Problem of Unstated Assumptions

There’s a piece in Inside Higher Ed today on yet another study showing that student course evaluations don’t correlate with student learning. For a lot of academics, the basic reaction to this is summed up in the Chuck Pearson tweet that sent me to the story: “Haven’t we settled this already?”

The use of student course evaluations, though, is a perennial argument in academia, not likely to be nailed into a coffin any time soon. It’s also a good example of a hard problem made intractable by a large number of assumptions and constraints that are never clearly spelled out.

As discussed in faculty lounges and on social media, the basic argument here (over)simplifies to a collection of administrators who like using student course evaluations as a way to measure faculty teaching, and a collection of faculty who hate this practice. If this were just an argument about what is the most accurate way to assess the quality of teaching in the abstract, studies like the one reported in IHE (and numerous past examples) would probably settle the question, but it’s not, because there’s a lot of other stuff going on. And because a lot of the other stuff that’s going on is never clearly stated, a lot of the stuff people wind up saying in the course of this argument is not actually helpful.

One source of fundamental conflict and miscommunication is over the need for evaluating teaching in the first place. On the faculty side, administrative mandates for some sort of teaching assessment are often derided as brainless corporatism– pointless hoop-jumping that is being pushed on academia by people who want everything to be run like a business. The preference of many faculty in these arguments would be for absolutely no teaching evaluation whatsoever.

That kind of suggestion, though, gives the people who are responsible for running institutions the howling fantods. Not because they’ve sold their souls to creeping corporatism, but because some kind of evaluation is just basic, common-sense due diligence. You’ve got to do something to keep tabs on what your teaching faculty are doing in the classroom, if nothing else in order to have a response when some helicopter parent calls in and rants about how Professor So-and-So is mistreating their precious little snowflake. Or, God forbid, so you get wind of any truly outrageous misconduct on the part of faculty before it becomes a giant splashy news story that makes you look terrible.

That helps explain why administrators want some sort of evaluation, but why are the student comment forms so ubiquitous in spite of their flaws? The big advantage that these have is that they’re cheap and easy. You just pass out bubble sheets or direct students to the right URL, and their feedback comes right to you in an easily digestible form.

And, again, this is something that’s often derided as corporatist penny-pinching, but it’s a very real concern. We know how to do teaching evaluation well– we do it when the stakes are highest— but it’s a very expensive and labor-intensive process. It’s not something that would be practical to do every year for every faculty member, and that’s not just because administrators are cheap– it’s because the level of work required from faculty would be seen as even more of an outrage than continuing to use the bubble-sheet student comment forms.

And that’s why the studies showing that student comments don’t accurately measure teaching quality don’t get much traction. Everybody knows that it’s a bad measurement of that, but doing a good measurement of that isn’t practical, and also isn’t really the point.

So, what’s to be done about this?

On the faculty side, one thing to do is to recognize that there’s a legitimate need for some sort of institutional oversight, and look for practical alternatives that avoid the worst biases of student course comment forms without being unduly burdensome to implement. You’re not going to get a perfect measure of teaching quality, and “do nothing at all” is not an option, but maybe there’s some middle ground that can provide the necessary oversight without quintupling everybody’s workload. Regular classroom observations, say, though you’d need some safeguard against personal conflicts– maybe two different observers, one by the dean/chair or their designee, one by a colleague chosen by the faculty member being evaluated. It’s more work than just passing out forms, but better and fairer evaluation might be worth the effort.

On the administrative side, more acknowledgement that evaluation is less about assessing faculty “merit” in a meaningful way, and more about assuring some minimum level of quality for the institution as a whole. And student comments have some role to play in this, but it should be acknowledged that these are mostly customer satisfaction surveys, not serious assessments of faculty quality. In which case they shouldn’t be tied to faculty compensation, as is all too often the case– if there must be financial incentives tied to faculty evaluation, they need to be based on better information than that, and the sums involved should be commensurate with the level of effort required to make the system work.

I don’t really expect any of those to go anywhere, of course, but that’s my $0.02 on this issue. And though it should go without saying, let me emphasize that this is only my opinion as an individual academic. While I fervently hope that my employer agrees with me about the laws of physics, I don’t expect that they share my opinions on academic economics or politics, so don’t hold it against them.

7 comments

  1. Wait–customer satisfaction surveys don’t measure an important aspect of faculty quality? Admittedly, there are good and bad reasons for liking or disliking a professor, but a professor no one likes or no one can understand needs help. Or are we saying students aren’t an important part of the college experience?

  2. a professor no one likes or no one can understand

    The problem is that these types of comments are often code for “We don’t want to be taught by someone who isn’t like us.” Many Americans, especially from rural areas, reach university age without ever encountering accents other than Standard American except possibly as comic relief characters on TV. Then they go to college and encounter a professor who speaks perfectly fluent English, but since that professor did not grow up in the US, the accent is other than Standard American. Some students find ways to deal with that. But too many don’t even make the effort. And often “not liking” a professor can be reduced to, “I do not like thee, Dr. Fell/The reason why I cannot tell.” Yes, that includes racism and nationalism, and even sexism.

    True, there are some professors who are incapable of explaining the material on a level the students can understand. I’ve had such professors myself. That should be called out when it happens. But to be useful, student evaluations have to have a way of distinguishing this case from the professor who speaks fluent English with an Oxbridge, Indian, German, Chinese, or other accent that the student is unwilling to understand.

    I think the main objection to student evaluations as a measure of how well a professor is teaching is that the tool is poorly suited to the task at hand. But as Chad notes, the better suited tools are much more resource-intensive, so you treat the problem like a nail because your hammer is the only tool you have.

  3. Some of the most valuable “customer satisfaction surveys” are done AFTER the product has been purchased and owned for a month or several years. (I am thinking of Consumer Reports, but also Amazon.) Only stupid corporations ask you to review the “experience” before the product is even delivered, which is when we do “teaching evaluations”.

    That means student evaluations should be done months or a year after the course ends. You want to evaluate physics? Ask the junior taking mechanics or thermodynamics or electricity and magnetism (engineering or physics). You want to evaluate first semester calculus or AP calculus? Ask the student taking second semester calculus. That is a huge positive in your tenure review process. It means you take learning, rather than “teaching”, seriously. Too bad you don’t use samples in your regular post-tenure reviews. It is a great idea.

  4. You do not point out that faculty are just as wed to these teaching evaluations as administrators are. As an administrator I have tried to reduce my dependence on student evaluations when annually evaluating faculty performance and it is often the faculty who do not wish to let go. When trying to suggest that these student evaluations do not really measure teaching effectiveness and therefore we can simply assume that all faculty are effective teachers, those faculty who receive high praise from their students do not want to accept this strategy.

  5. Basically, you’re saying that teaching evaluations conform to the same responsibility shunning pattern as Value at Risk (VaR) measurements in banking: “VaR is not ubiquitous because traders and CEOs are unaware of its flaws. It is ubiquitous because it allows senior managers to project the facade of effective supervision without taking on the trouble or the legal risks of actually monitoring what their traders are up to. It is sophisticated enough to protect against the charge of willful blindness and it allows ample room for traders to load up on the tail risks that fund the senior managers’ bonuses during the good times. When the risk blows up, the senior manager can simply claim that he was deceived and fire the trader.”
    Ass covering for academic administration, rather than anything useful. Basically, get good teaching evaluation stats, and we can pretend good teaching was happening, and so pretend that everything is fine.
    For the quote, see “How to commit fraud and get away with it: a guide for CEOs,” at http://www.macroresilience.com/2013/12/04/how-to-commit-fraud-and-get-away-with-it-a-guide-for-ceos/
    Note the date, and then see if it predicted what went on at Wells Fargo.

  6. These studies show weak correlations between students own judgments about what they learned and multiple choice tests measuring a sample of what they learned. Maybe the tests aren’t measuring the “real” learning.

  7. When did a party trick like student evaluations start being taken seriously as a metric? In the 1970s these evaluations were sort of an amusement for students to compare notes on professors. As CCPhysicist has noted, you can’t really evaluate a course until you start taking the next course in the series. This just sounds like lazy management.

    For an amusing look at gender in teaching evaluations:

    http://benschmidt.org/profGender/#%7B%22database%22%3A%22RMP%22%2C%22plotType%22%3A%22pointchart%22%2C%22method%22%3A%22return_json%22%2C%22search_limits%22%3A%7B%22word%22%3A%5B%22funny%22%5D%2C%22department__id%22%3A%7B%22%24lte%22%3A25%7D%7D%2C%22aesthetic%22%3A%7B%22x%22%3A%22WordsPerMillion%22%2C%22y%22%3A%22department%22%2C%22color%22%3A%22gender%22%7D%2C%22counttype%22%3A%5B%22WordCount%22%2C%22TotalWords%22%5D%2C%22groups%22%3A%5B%22unigram%22%5D%2C%22testGroup%22%3A%22B%22%7D

Comments are closed.