Student Evaluations Promote Mediocrity

Colleges and universities should encourage good teaching—but what is “good teaching” and how do we recognize it? Since the 1970s these questions have seemingly been settled. The solution adopted by most schools was to administer an anonymous questionnaire, asking students to rate the teacher, the course in general, the instructional material, plus a medley of teaching-relevant traits—for example, did the teacher treat me with respect, was the grading system fair, and so on.

These fill-in-the-circle questions, with some space allocated for written comments, appeared to be scientific and were thus a godsend for accountability-minded administrators.

Nevertheless, this strategy is deeply flawed, the triumph of convenience over intellectual substance, and, most critically, it further dumbs down a university and erodes its standards.

Prior to the 1970s, judging instructional quality took a different form. It was far more labor intensive, often informal, and took place at regularly scheduled reviews such as the granting of tenure. Of the utmost importance, it stressed intellectual content.

Course materials were central. For colleagues assigned to evaluate a teacher’s ability, even a cursory glance at a syllabus would reveal intellectual substance via the instructor’s choices on readings, paper assignments and overall rigor. Course materials showed how much class time was spent on empty-calorie student reports versus lectures and whether key duties were farmed out to teaching assistants or guest speakers.  How many hours were consumed by pop films, “fun” field excursions, razzle-dazzle technology, and similar academically undemanding activities? Endless true/false or simple multiple-choice questions on trivial details hardly confirmed an instructor’s commitment to tough academic standards.  

The key here is professional judgments by academic peers, not amateur opinions by those unable to distinguish an authoritative textbook from an Idiot’s Guide to European History (that is, students). The evaluation system assumed that professors were professionals committed to a core body of knowledge, not restaurateurs assembling a focus group to discover popular tastes. Scrutinizing these materials was tedious but hardly impractical; any experienced educator could instantly spot Mickey-Mouse education, and even a stellar classroom performance cannot make education any less Mickey Mouse.

Equally critical was the instructor’s intellectual stature. Academics are always calibrating each other’s professional expertise and, thanks to reading colleagues’ papers, attending departmental seminars, and even engaging in casual “shop talk,” sound judgments are quickly reached. This was certainly relevant for determining good teaching. An intellectually struggling colleague was unlikely to impart knowledge, regardless of outstanding questionnaire scores. Yes, it might be argued, the uninformed instructor did energize his or her students, but what was conveyed by this ill-informed instructor? And can students accurately discern the difference between fluff and substance? Intellectually committed professors will always choose content over style, save in extreme cases where a brilliant scholar mumbles incoherently. Moreover, good teaching often involves inflicting pain (e.g., requiring students to master statistics, read original texts), and pain is hardly the route to popularity.

Lastly in this “pre-student questionnaire” approach, enrollments and grade distributions were indicators of quality teaching. It was fairly easy to identify those whose very name on the timetable was the kiss of death for a course. Others were well known for their ability to send students fleeing after the first lecture. Still others attracted students only due to a well-publicized ideological agenda. Equally telling is the grade distribution—those overly generous with “A” and not failing anyone can hardly claim to be champions of intellectual excellence.

The old-fashioned approach did not exclude student opinion—letters from students were often solicited. But, these comments were only a small part of the overall, professionally assembled picture, and in any case, such letters were typically more detailed and thoughtful than the sentence-or-two comments that suffice for feedback on contemporary ten-minute survey instruments.

Today’s student-survey approach may tell us how students viewed the course, but the data tell us nothing about actual learning. It is not that questionnaire designers disdain knowledge; they just cannot measure it, and thus they exclude a key element of teaching. Ironically, universities can now hire or retain teachers who impart nothing of value but have superb ratings.

Questionnaire defenders will naturally insist that the old-fashioned, unsystematic mishmash strategy also neglected knowledge, but this is not true. Course syllabi, assignments, exams, and grades cannot precisely certify that learning did or did not occur; but such indicators, fuzzy as they might be, are far superior to “On a scale of 1 to 5, how would you rate how much you learned in this course?” It is hard to insist that this single-scale number outshines the data-rich assembled “old-fashioned” conclusion, especially when you add in the professor’s intellectual standing and grading standards.

The current popularity-driven evaluation method also impedes serious learning. When tangible rewards flow from high ratings, it is perfectly rational to demand less of the students and reward them more. It is an open secret among older faculty that burdensome readings lists have shrunk. Add enrollment-boosting grade inflation to keep “customers” happy, and perhaps also increase department budgets, despite often mediocre work, and it is no wonder that contemporary graduates often seem ill-prepared. They are.

But the greatest intellectually relevant difference between the old and the new approach is the redefinition of “good teacher.” Previously a “good teacher” positively influenced a few of the most talented students. These were the revered intellectual giants of their era–Morris Cohen (the legendary CCNY philosophy professor) or Leo Strauss (the University of Chicago teacher of political philosophy) among others who profoundly shaped the lives of countless students. For all intents and purposes, those who disliked these stars were irrelevant in establishing a teacher’s standing. Reputations rested entirely on intellectual achievement.

Now, however, being a “good teacher” means having a high average rating. A few he-changed-my-life outliers count little in a class of 50 or 100, especially if the instructor also antagonized five students in the same class. A savvy instructor is thus advised to target the middle, and forget about lighting fires in the most gifted. Catering to the averages also helps boost enrollments, no small matter as academic budgets become body-count driven.

In sum, the so-called new and improved teacher assessment slights intellectual rigor. If American higher education were serious about high standards, we would return to the previous labor-intensive but intellectually centered system. Alas, since the questionnaire approach is so lazy-friendly, this is unlikely to happen. The life of the mind is now conveniently simple since brief cookie-cutter questionnaires convert “good teaching” from imparting important knowledge to what students think “good teaching” is—regardless of their expertise, and even then, it rests on averages.

One can only imagine Socrates being hauled into the Dean’s office and told that while a few of his students would probably profoundly influence Western civilization thanks to his quirky teaching style, nearly all the rest were clueless, except for knowing that he admitted knowing nothing. Perhaps, counseled the Dean ever anxious about tuition, he should take some tips from Professor Aristophanes and jazz things up with a dash of rowdiness. I’ll drink poison first, he was rumored to say.