What Does It Take to Evaluate Teaching? – Uncertain Principles Archive

In which we compare a couple of different systems for evaluating teachers, looking at what’s involved in doing a fair assessment of a teacher’s performance.

——–

Another casualty of the great blog upgrade, in the sense of a post that was delayed until the inspiration for it has been forgotten by most of the people who might want to talk about it, was this Grant Wiggins post on accountability systems:

[The Buckingham, Browne, and Nichols prep school where he taught in the 80’s] had a state of the art teacher performance appraisal system back in the 80’s (we’ll need current or recent folks to tell us if it is still operating this way). Every teacher at BB&N was evaluated by peers as well as supervisor (in most cases, the supervisor was your Dept. Head at the Upper School where I worked). The person being evaluated chose a colleague within the department as well as someone outside of the department (along with the Dept. Head). It was everyone’s obligation to visit 3-4 classes and have pre- and post-observation conversations about what they saw, look at lesson plans and student work, then write it all up for consideration by the Dep’t Head and Headmaster. In addition, the person being assessed would write up a lengthy self-assessment as to strengths and weaknesses, and offer any commentary on the write-ups by the appraisal team. As I recall, there were no hard and fast criteria or rubrics that had to be used. The teacher being evaluated could actually propose some of the criteria and evidence to be used.

In my 3 part-time years at BB&N I was evaluated once and did one peer evaluation at the request of a colleague in the English Dept. I found it an uplifting, collegial and educational experience each time – more so, oddly enough, when I was a peer assessor than when I was being assessed.

Drawing on that, he proposes a formal accountability system that could be used to do an accurate assessment of teaching performance. Which does include pre- and post-test scores as one component, but also the sort of peer evaluation and comment that he describes above.

What really struck me about this was the way it mirrors the tenure-review system here. For example (re-grouping his points somewhat):

Peer review should complement supervisory review

Some portion of the evaluation should include a written statement by the “team” that the teacher most often works with (grade-level team, department) as to their abilities as a team player.

While our tenure system does incorporate comments from the department chair and administrators, the information-gathering phase also includes interviews with departmental colleagues. The final decision is formally up to the Dean of the Faculty and the President, but the recommendation as to whether to grant tenure or not is made by a faculty committee.

The teacher should be able to propose evidence of success beyond the ones proposed by the state, district and supervisors in order that the picture be rounded out and the teacher in question feel ownership of (at least some) of the evidence used in the decision

As part of the process, faculty up for tenure submit statements on teaching and research, which gives them exactly this sort of opportunity to describe what they’re doing and why.

Teachers should be able to respond in writing to the draft evaluation

One of the most angst-inducing parts of the tenure review process is the infamous “second interview” with the information-gathering ad hoc committee. After the committee has completed all of their interviews and gathered comments from students and external reviewers, they send a list of questions to the candidate covering all the potentially significant issues that were raised in a negative way during the review. They then meet with the candidate, and discuss those points, after which the candidate can submit a written response to be included with the report.

It’s probably not surprising that these systems mirror each other. Wiggins is interested in setting up a system for evaluation that is as fair as possible, and will generate as little controversy as possible (thus his concerns about teacher “ownership” of the results, and providing opportunities for response. The tenure system here is making the ultimate high-stakes judgement– having passed this review, Union will never be rid of me– so they have a strong interest in making the evaluation as accurate as possible, And again, they’re trying to minimize controversy, so as not to get sued by candidates who feel they were done wrong.

What these systems are not is efficient. These are both very labor-intensive activities, both for the people being evaluated and the people doing the evaluation. Faculty and administrators need to spend time sitting in on classes, writing and reading statements, and combining all these different measures to reach some sort of final determination. Which is why this kind of review is a one-time thing at the college level (well, one-and-a-half– there’s a reappointment review in a professor’s third year that includes most of the elements of the tenure review, but isn’t quite as comprehensive), and why the sort of review he describes only happens at small elite prep schools.

This is why people talking about “teacher accountability” tend to default to things like test scores (maybe in a value-added manner), and why our more regular merit evaluations rely on simpler inputs (course evaluations, lists of publications, brief statements). Doing teacher evaluation right takes a lot of time and effort, much of it needing to come from the people who get to set the policies. And a system like Wiggins proposes doesn’t make for a punchy sound-bite during a political stump speech, or provide an easy stick with which to bash unions. Thus, high-stakes testing, which is politically convenient but educationally disastrous.

It’s important to have these ideas out there, though, to remind people that we do know what it takes to do the job. And while the system Wiggins proposes is unquestionably more labor-intensive than just punching test scores into a spreadsheet, it can at least provide a starting point for an effort to find a system somewhere between the extreme values of “horribly inefficient but fair and accurate” and “seemingly efficient but laughably inaccurate.” Provided there’s anyone willing to make a good-faith effort.

5 comments

Peter Morgan says:

June 7, 2012 at 1:58 pm

Just discovered that you’re not on vacation, that in fact the old RSS feed stopped working. Some time ago! I remember vaguely the prospect of a great blog upgrade, but I didn’t notice that the RSS feed would just stop. Just installed the new RSS feed and now I’m catching up. Congratulations on your 10th anniversary, etc.
Pingback: The Scienceblogging Weekly (June 8th, 2012) | Techack | gadgets & technology
Pingback: The Scienceblogging Weekly (June 8th, 2012)
Pingback: The Scienceblogging Weekly (June 8th, 2012) | My Blog
Ian Bearden says:

June 12, 2012 at 10:02 am

I am not sure I have so much to add, except:
1. I think this is an extremely important issue and one that needs more discussion.
2. A sane and valuable system may not seem “efficient”, but it seems to me that we may not have the appropriate metric for measuring efficiency. If we could count all the (potential) benefits* of a good “peer-reveiw” of teaching, it might not seem so inefficient.

Thanks for bringing this up!
-Ian
*presumably, both the reviewed and those reviewing should learn through this process. On the other hand, maybe I am a bit too pollyanna-ish, and this will always end up as ticking boxes to save time for writing grant proposals?

Comments are closed.