Limitations and Ethical Challenges of Using genAI in Teaching
10/8/2025 8:00 am
This entry continues a conversation from a previous monthly meeting about uses of generative AI for our community. In our first entry in this series, we provided a "cheat sheet" for which generative AI tools work best in different situations (and when you might want to think about switching to a new tool!), in the second entry, we presented tips and tricks for using AI in your workflow.
Most recently, we discussed some limitations and ethical questions surrounding its use in research and everyday applications, and in this entry, we extend this to questions surrounding teaching uses.
Does AI have a place in the classroom?
Should AI be introduced responsibly into classrooms or banned to promote independent learning? Experts and instructors may disagree on the particulars, but consideration of the optimal path forward must center around a particular feature of learning: “productive struggle.”
Often, we give the advice “work smarter, not harder” to encourage completing assignments/projects in more efficient ways, not just slogging through. But, if taken to the extreme, this can overlook a crucial developmental piece of the learning process: sometimes a slog is what is required for retention and breakthroughs. A recent report from Bellwether explored how AI affects productive struggle - the process of “engaging in tasks that are just beyond current mastery levels, supported by timely feedback and opportunities to iterate.” This allows students to “build knowledge, resilience, and agency.” This process is not always comfortable, and the temptation to exit the process by using generative AI tools to work around these feelings of discomfort and friction is highly tempting. The report also notes, “The risk is more than cheating; it is about students outsourcing the hard, mental work, like generating ideas or grappling with ambiguity, that builds their capacity to think independently.”
Of course, encouraging students to use generative AI in thoughtful ways that improve learning outcomes would be an ideal way to use this new technology. However, that is not currently happening at any significant scale. A study performed by Anthropic on its generative AI tool, Claude, found that 47% of student-AI conversations showed limited engagement, evidence that students are largely using these technologies to avoid mental discomfort and complete assignments quickly, rather than attempt to learn through the process.
When combined with the fact that generative AI tools bring their own black box of ethics to their outputs and interactions, as well as lowering neuronal engagement (discussed here), allowing persistent generative AI usage by students may be doing long-term damage to their ability to think critically, reflect, parse nuance, and reason ethically. Of course, as the report notes, “not all friction may be inherently beneficial, and not all ease may be harmful.” We have to ask: “when does ease enable greater learning, and when is ease a shortcut with a hidden cost?”
However, pushing students away from generative AI usage isn’t easy. Many teachers are experiencing significant frustration as they see lowered competence, less effort, and less engagement with the materials. This article from 404 Media compiled some reflections from professors on their frustrations with the rise of AI-generated content submissions. Another professor reflects, in an article in The Walrus, “I once believed my students and I were in this together, engaged in a shared intellectual pursuit. That faith has been obliterated over the past few semesters. It’s not just the sheer volume of assignments that appear to be entirely generated by AI—papers that show no sign the student has listened to a lecture, done any of the assigned reading, or even briefly entertained a single concept from the course.”
While one solution to this may be to embrace in-class, handwritten assignments, the author notes that “Writing in a classroom can never approximate the sustained concentration required to produce a carefully thought-out, polished piece done over a period of time. Being assigned such projects encourages students to work at a higher level and to flex more of their intellectual capacities. It teaches them that good writing is something you craft, not something you spit out at a moment’s notice. And it demonstrates to them that they are capable of producing something they can be proud of.”
The Bellwether report also offers some starting points to consider where generative AI use may still support productive struggle and other positive learning outcomes. As the report notes, just because something is difficult does not make it worthwhile. However, the evidence is still very new and instructors are still working on optimizing methods to include the tools productively. The report concludes that education must undergo a shift towards “intentionally embedding motivation, metacognition, and adaptability into the fabric of learning experiences, not treating them as add-ons…[and] articulating which foundational skills still require deep fluency and which may be responsibly supported by tools without compromising developmental integrity.”
This is not the first time that education has undergone a crisis and transformation in the face of new technology. An easily identifiable example is the introduction of easy-to-use and accessible calculators in the math curriculum. But is generative AI to language-based learning what calculators are to math? Many experts say no - first, calculators can only perform operations on already-existing information, and cannot generate new ideas or make creative decisions. Importantly, this means that they cannot provide incorrect or hallucinated information. Secondly, a model of reasoning is required to understand what to input into a calculator, whereas generative AI tools are often much simpler to prompt and receive output without even basic reasoning work done by the user. However, there is a precedent for determining how to take new technologies and adapt learning to supercharge student success. This article from Forbes includes a brief history of technology panic in education, how it has been overcome, and some potential directions for generative AI tools in the classroom.
Prolific generative AI use in classroom settings is not only a concern from teachers towards students, though. While there are many helpful ways to use generative AI to improve teaching, personalize learning, and expand your repertoire of instruction (see the November 2024 monthly meeting!), students can also be frustrated by overuse of generative AI by teachers, who they hope to be gleaning expertise and attentive instruction from. A recent piece in the New York Times described students' frustrations with generic, ChatGPT-esque feedback on their assignments, and how they often feel cheated out of the expert instruction they believe they are paying for. In our capacity as instructors, while we may find ways to use generative AI tools to improve our course materials, we cannot use generative AI to replace expertise or personalized engagement with students.
Effectiveness of AI detection tools/limitation
Another proposed solution to the uptick in generative AI use on assignments is to use AI detection tools, offered through services like Turnitin and Blackboard. However, AI generation detectors are often unreliable. Common detectors are known to be so unreliable that some universities are opting out of using them. This choice is motivated in part by a concern around falsely accusing students, acknowledging how devastating false cheating allegations can be in an academic career. However, false negatives also remain a concern. A recent study investigated the accuracy of GPT detectors, finding that native language and prompt design both drastically affect detection accuracy. Text written by non-native English speakers was misclassified as AI-generated for 61% of the analyzed essays, whereas only 5% of the essays written by native English speakers were misclassified. Increasing the complexity of wording in the essays written by non-native English speakers reduced the rate of false positives to approximately 11%. Racial bias has been observed as well, with black students reporting school work falsely being identified as AI generated at twice the rate of their white classmates (another data point in the trend of generative AI tools recapitulating societal biases, as discussed previously). On the other hand, prompting specific writing styles can increase the rate of false negatives, where detection of AI-generated college admission essays decreased from 70% to 3% when the prompt included writing style. The same trend was observed, less drastically, in scientific abstracts as well. Because these tools work by identifying common characteristics of AI-generated writing, rearranging the generated text substantially reduces detection accuracy, regardless of whether the wording had been tweaked by a human or by using an AI paraphrasing tool. As humans, we can aim to identify AI-generated text by being vigilant and aware of common AI-indicators in tone and style, such as inconsistent use of em dashes (preferred by AI) and en dash (more often used by humans).
Authors
Julia Dunn, Ph.D. (she/her) (University of Denver)
Hannah Dimmick, Ph.D. (she/her) (University of Colorado Anschutz Medical Campus)
