Limitations and Ethical Challenges of Using genAI in Research

9/10/2025 12:00 am

This entry continues a conversation from a previous monthly meeting about uses of generative AI for our community. In our first entry in this series, we provided a "cheat sheet" for which generative AI tools work best in different situations (and when you might want to think about switching to a new tool!) and in the second entry, we presented tips and tricks for using AI in your workflow.

However, the use of generative AI in our research, teaching, and personal lives has limitations and prompts ethical questions. We review some issues surrounding use of genAI in research here, with perspectives from some of our members included.

Should AI be generating scientific content?

As researchers, we have a moral responsibility to maintain the integrity and credibility of research in order to ensure the trust of the public in science. Generative AI, however, is not a moral agent, and therefore cannot care if it generates material that is false, plagiarized, or not credible. When AI generates content, it is regurgitating the training data, and cannot add a new perspective or conclusion. AI-generated text may plagiarize its training data by copying directly, paraphrasing too closely, or by presenting ideas as original without citation. AI companies are not forthcoming with their sources for content generation, nor do AI tools always provide accurate citations. Since the rise of generative AI, lawsuits have arisen on the use of copyrighted information for training, especially without permissions. The New York Times sued both OpenAI and Microsoft for using millions of articles to train their bots, while three authors sued Anthropic for training their bot on pirated writings without permission and without paying the authors for their work. In the case of Anthropic, their use was considered ‘transformative’ and therefore fell under ‘fair use’ in a district court.

Concerns regarding plagiarism and accuracy are relevant when exploring the question of whether generative AI should be used to write scientific papers. In a recent poll, researchers were asked when AI is or is not appropriate to use in the preparation of scientific papers. One third of researchers agree it is appropriate to use AI as an editor, and an additional third agree as long as the use is disclosed, but far fewer feel that it is an appropriate translator. This level of acceptance may be because editing already written material reduces the likelihood of false information being generated or committing plagiarism. Interestingly, only one fifth of the researchers reported using AI to edit their papers themselves. However, when it comes to using generative AI to draft a paper, more than one third consider this not appropriate, and two thirds of the researchers interviewed say it is never appropriate to use AI to draft a results or discussion section. In the same poll, researchers were surveyed about the review process. Overwhelmingly, the responses agree that there are no circumstances under which using AI for peer review is appropriate. This is a hopeful finding, as Elsevier states that peer review can only be completed by humans due to the critical thinking and assessment required. Additionally, manuscripts under review should be treated as confidential, and uploading materials into an AI tool could result in violation of author confidentiality, proprietary, or data privacy rights.

I don't tend to use to AI gather data about my own research or writing at all. That feels like it is "cheating". I know that is only my perception, but I worked really hard to be able to be able to appraise the literature and the data I collect. I think relying on AI to do that takes the creativity and critical thinking elements out of writing. I think it is unavoidable though, SO I challenge my students to make sure if they do ask AI about things, also ask where the information it gave them came from and determine if that is a credible source. I also tell them they should ask for all the information that can be found that opposes what AI told them. That way, people can make informed decisions about the information. – Erin Lally

Hallucinations and Incorrect Information

Although we hear constant warnings about double checking the work of generative AI, and how “hallucinations” (a phenomenon where a LLM perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate) frequently crop up in outputs, why are these hallucinations being produced? Can we ever trust these tools to provide factual information? One recent article, with a flair for fun language, even argued that AI tools are not “lying” or “hallucinating,” but instead “bullshitting.” Their thesis was that “because these programs cannot themselves be concerned with truth, and because they are designed to produce text that looks truth-apt without any actual concern for truth, it seems appropriate to call their outputs bullshit.” Indeed, generative AI tools do not have any motivation for their speech, they simply “produce human-like text…by estimating the likelihood that a particular word will appear next, given the text that has come before.” This is not the same as cognition or reasoning, and therefore, these tools cannot be expected to produce truthful results. Furthermore, when asked to provide sources or explain errors, this “bullshit” often snowballs into greater factual errors. Additionally, these mis-statements seem to be getting worse, with no obvious explanation as to why.

Improvements in accuracy can be made by using LLM tools specifically trained on scientific literature (such as Phind, Nouswise, or Consensus), but given generative AI tools’ lack of ability to deeply understand nuance or context, errors can be produced even when provided high-quality information. Ultimately, these issues are built into the structure of the models, so they “will never go away.”

There have been multiple recent high-profile examples of unchecked hallucinations making their way into the public eye, most notably in the US with the publication of the “MAHA Report” that was riddled with fake links and misapplied citations. Additionally, the new FDA Drug Approval AI tool has been found to invent citations and evidence during pharmaceutical application processing. While these high-profile cases raise serious concerns about government policy surrounding generative AI, hallucinations and misinformation are becoming rampant in the more “everyday” scientific literature many of us engage with and contribute to. These errors, often borne from zealous “publish or perish” policies at universities that can overwhelm researchers, contribute to misinformation and distrust in experts. While using generative AI to create published materials may help you find sources and context outside your typical “bubble,” rigorous fact-checking to ensure that the citations exist and are being presented in an ethical manner is critical. While many researchers are overworked and underfunded, this cannot be an excuse to sacrifice scientific accuracy for speed.

While AI is incredibly helpful, it’s important to use it critically and not accept everything it says or create for you. I had an experience that it made up citations for a topic that I asked, which made me more cautious, especially when using AI for scientific writing or literature searches. Verifying AI-generated content is essential to ensure accuracy and reliability. – Asal Aflatounian

Authors

Julia Dunn, Ph.D. (she/her) (University of Denver)

Hannah Dimmick, Ph.D. (she/her) (University of Colorado Anschutz Medical Campus)

Limitations and Ethical Challenges of Using genAI in Research

9/10/2025 12:00 am

Links