The folks at OpenAI released a new version of their GPT language model, in chat-bot form, which has kicked off another round of hand-wringing about what it will do to academia. I wrote about this back in August, and haven’t really changed my opinion, but since it’s a live topic again, and I have more readers now, I may as well taken another whack at it.
I think a fundamental problem here is that a lot of discussions about writing in academia are conflating two related but distinct purposes for which you might assign students to write things. In an effort to avoid more obvious but kind of loaded terms I’ll call these “formative” and “evaluative” writing. I think only the evaluative piece of this is really meaningfully threatened by AI text generators; the formative piece is largely unaffected, and can still be useful as long as you’re clear about what you’re doing.
The evaluative piece is, of course, the bit where you assign students a question to answer, and read their answers as a way to determine their level of understanding so you can give them a grade. This is unquestionably challenged by the ready availability of AI models that can reliably churn out text that plausibly resembles the output of a typical undergraduate. It’s also arguably the least important piece of the process.
The formative piece is the bit where students actually learn the material through the process of having to write about it— having to take source material, process it, rephrase it, and express it in a way that conveys the meaning to a different person. It’s the core thing I’m doing with this bog a lot of the time— I have strong but muddled opinions about some thing that’s making news or generating discussion, and I open a new post and start typing as a way to clarify my thoughts. I not infrequently end up in a different place than I thought I was going to when I started the post, and sometimes have to go back and change the provisional title I put at the top when I first opened the editor.
That process of synthesis and clarification is where the actual learning happens. In my own field, this mostly takes the form of problem sets where the conclusion is often given at the very start. All those questions of the form “Show that under these conditions the lowest energy of that system is given by this expression” are posed to students not because there’s any clever contrarian Take that could argue for a different answer, but because writing out all the steps is valuable. It forces you to think about how you get from the third line of the derivation in the book to the fourth, and all the fiddly details of why those pesky additional terms cancel out. You don’t really understand a problem until you’ve worked it through in detail, and written it out in a way that makes clear to someone else how it all works.
And, as I said back in August, all of this stuff is readily available to students in a pre-written form, and has been for decades. It’s more available now than back in the 90’s where you had to find a paper copy of one of those samizdat solution manuals, or the one obscure textbook that used the problem as a worked example. But the problem of students being able to defeat the evaluative purpose of homework by simply copying solutions is not remotely new.
We still assign those kinds of problems as homework not because they’re especially useful in evaluating understanding but because they serve a formative purpose. And the engaged and responsible students know that, and will put in the work because they know what they’re getting out of it. There’s a grade assigned, yes, but that mostly functions as activation energy— a little external incentive to sit down and put in the work that leads to learning.
I think that if you’re clear and up-front about the formative purpose of assignments, you don’t really need to worry about AI chatbot solutions any more than you do about the existence of scanned solution manuals on the Web. I told my students in the just-concluded quantum class that I’m well aware that there are detailed solutions for most of the textbook problems online that can be found with an incredibly cursory Google search— in fact, I made heavy use of them myself when writing up the solutions. But the point of the assignments has never been just getting the final answer: the real point is always the work along the way that helps nail down the key concepts and processes. And students who actually care about the courses they’re taking and are invested in the education they’re making for themselves will, for the most part, do that work in an honest way.
Are there students who don’t care, and won’t put in the work? Absolutely. But, to put it bluntly, I don’t really give a shit about them and what they do. I’ll put a little effort into making it harder for them to skate by—and if we’re being honest, it doesn’t generally take much— but I don’t think it’s healthy to re-shape an entire course around the fear that somebody might be getting away with cheating themselves out of learning the things they’re supposed to. The real point of this job is to provide educational opportunities for students who want to take advantage of them. If somebody wants to gamble eighty grand of their parents’ money on being able to cheat their way through a year of college, that’s on them.
This is, of course, somewhat discipline-dependent— in fields where the “right answers” are less objective, it’s likely harder to cleanly separate the formative from the evaluative. But looking at the discussion of the latest chatbot on academic social media, I think that faculty are massively over-weighting the evaluative piece relative to the formative one. I suspect that reflects a long-term collective failure to think carefully about why we assign the things we do, leading to an institutional conflation of formative assignments with evaluative ones. I think the challenge of these language models is best addressed not by new rules or invasive anti-cheating measures, but by thinking clearly about what purpose is served by each assignment, and most importantly by being up-front with students about that.
If the only reason you’re asking students to do a particular assignment is so you can rank-order their answers and convert that ranking to letter grades, then, yes, that assignment is threatened by the existence of large language models. But then, you probably ought to re-evaluate whether that’s a thing that you really want (or ought) to be doing in the first place.
In what seems likely to be a coincidence of timing— at least, it appears to have hit my Inbox a bit before the chatbot discussion exploded on social media— Timothy Burke has a typically long and thoughtful piece about related issues. I disagree with some of his more dystopian conclusions— the idea that these bots will generate more administrative bafflegab seems to rest on the premise that sending those notes and memos requires significant effort from the people who send them, which I think is flawed. Fully adapted Deans generate this kind of thing as effortlessly as Burke and I generate rambling 1500-word blog posts. But overall, it’s interesting and worth a read.
I do kind of recoil from his call for more “expressive” writing, though, which I think ties back to the formative vs. evaluative thing, and also reflects a disciplinary difference. That is, I’m much less interested in having students develop a distinctive voice to their writing, because what they’re writing in any physics class I would teach is necessarily kind of rote— the start and end points are given in the question, and the steps to get from the one to the other are limited and formalized. Attempts to be distinctive in writing physics are more likely to confuse than to clarify, for both writer and reader, and as such will tend to obstruct the actual purpose of the assignment. (This also probably connects back to the issues with analogies and overhype associated with the whole wormhole kerfuffle from yesterday’s post. That’s another full post, right there…)
And, of course, authentically expressive and distinctive writing runs the risk of being actively irritating to the reader. Which very definitely gets in the way of the evaluative piece of things— do I dislike this piece because its conclusions are unsupported, or is it just because I find the author’s voice grating? But that’s a whole other Thing…
Anyway, his piece is well worth reading, so check it out.
This is arguably duplicative of what I said in August, but, hey, I warned you of that at the top. If you’d like to see what the third try looks like a few months from now, here’s a button:
And if you want to use a chatbot to generate a pseudo-thoughtful response to this for me to evaluate, the comments will be open:
One weird thing about blogging that I have realized is that I am much more aware of repetion than readers are...
Heey. I'll just repeat my comment then... :)
The evaluative piece of the work is what we use to determine professional hierarchies. I.e. there's a lot of money at stake.
The main argument against cheating is that it's hard to sustain consistently and thus truly influence your positioning and affect general outcome. Getting a B+ versus a B- in a single class (even an A+ versus a B-) is unlikely to have life altering effects.
But the easier it is to carry out, especially over time, the more worried we should be.