Spotting the Machine in the Margins

A Non-Tech Guide for Army Professional Military Education Instructors to Detect Artificial Intelligence-Assisted Writing

 

Lt. Col. Patrick Naughton, U.S. Army Reserve

 

Download the PDF Download the PDF

A student works on a computer 6 December 2018 at the Command and General Staff School, Fort Leavenworth, Kansas
 

As artificial intelligence (AI) tools become increasingly prevalent in academic settings, instructors within the Army professional military education (PME) must develop new strategies to uphold academic integrity and foster intellectual growth. In the absence of AI-detecting software, this article identifies ten key indicators that may suggest AI-generated writing is present in student essays, along with two new ways to possibly evaluate student individual development. Understanding these indicators can help PME faculty assess written work more critically and safeguard the unassisted and individual development of agile, adaptive leaders prepared for the complexities of future conflict.

AI tools, most notably large language civilian models like ChatGPT, are transforming how students approach writing.1 While these technologies offer potential benefits when used responsibly to generate ideas, brainstorm, clarify research questions, and improve one’s understanding of complex concepts, their misuse threatens the intellectual integrity of the Army’s PME system.

“AI or die” is the latest mantra being tossed around in technology circles.2 A battle cry that rings especially true for military professionals. AI can be invaluable in analyzing large datasets and throughout all steps of the military decision-making process, which ultimately seeks to better develop and inform the commander’s visualization of the operational environment. Not embracing the possibilities found within AI would be a monumental folly on our part.

Comparatively, Army PME is critical in developing leaders capable of independent, creative, and critical thinking while operating in uncertain and complex environments. This skill is often developed and assessed in PME through original essays produced by students rooted in their study and understanding of the course materials and their own independent professional development. By recognizing the distinct characteristics of uncited AI-generated writing, faculty can protect the unassisted and individual developmental intent of PME while still encouraging the use and understanding of AI as a force multiplier.

As AI-generated text becomes increasingly difficult to distinguish from good human writing, PME instructors must adopt a nuanced and deliberate approach to detecting misuse. The following ten indicators provide a practical framework to support this effort without using AI-detection software.3 Note that all are predicated on instructors knowing their curriculum in detail, and, more importantly, knowing their students and having a set of writing samples to compare their iterative written work against throughout a course. Some of the following indicators are less relevant without this knowledge and background.

1. Lack of Drafting and Revision Evidence

Effective writing is the product of reflection, revision, and engagement with feedback. AI-generated papers are frequently submitted in a single, polished form without evidence of developmental stages. This is particularly notable when instructors request outlines, peer reviews, or multistep writing submissions as part of the curriculum.

An instructor must take note of their students’ writing style when viewing these presubmissions. Coupled with other writing samples such as emails, journal entries, or discussion posts, it will give a solid indication of a student’s writing ability. If the final essay differs significantly from these samples, this is an indication that the text is AI generated. If students are not good writers, they will also not be good editors, which will manifest itself in a poorly revised AI-generated product.

2. Uncharacteristic Shifts in Style or Vocabulary

A sudden elevation in vocabulary, syntax, or complexity—especially when inconsistent with previous submissions—may suggest the use of external assistance. Significant divergence in grammar or technical language can be a warning sign, particularly in courses that track students writing across multiple assignments.

Generally, the overall work will appear out of character when compared to their previous writing samples. For example, if the student never uses em dashes in other submissions and then their final essay contains “—” after “—” in every paragraph, this is another strong indicator of AI-generated text. Do note, however, large language civilian models will learn one’s writing style. Meaning, the more a student uses their AI account, the harder this shift will be to detect.

3. Overly Polished but Generic Tone

AI-generated writing often appears grammatically flawless yet devoid of the individual perspective, personal, or professional voice expected of a human being. The language may resemble a formal journal article or encyclopedia entry rather than a student essay rooted in personal and professional insight and context.

AI image by author

Each student has a personal background rooted in their upbringing and professional experience. A soldier well-versed in maneuver units versus sustainment will use different jargon and doctrinal terms to describe the operational environment. This is similar to someone raised on the east coast versus someone raised on the west. An author’s unique way of expressing themselves will manifest in their tone, syntax, and choice of words. If this suddenly disappears in a completed essay, it either means they had a strong proofreader or AI wrote the piece. Listening to their paper via the Microsoft Word “Read Aloud” function can help identify a change in tone. Additionally, large language models are being perfected to answer prompts like “make this narrative sound more human and less AI.” This, along with an AI persona’s digital characters programmed to mimic human personality traits and behaviors, will eventually render this indicator moot.4

4. Inconsistent Depth of Analysis

Surface-level engagement with doctrinal or conceptual material is a common trait in AI-generated essays. Additionally, at times, this analysis is almost overly comprehensive, as if AI is scanning too many doctrinal concepts, both outdated and current. While the writing may appear well-structured, it frequently lacks original analysis or synthesis.

Large language civilian models can access our doctrine; however, they cannot apply practical application and experience to it. Only a human can take their years of real-world experience and translate it into a deep assessment of doctrinal concepts. The absence of this depth of scrutiny is indicative of AI-generated text.

5. Absence of Contextual or Experiential Relevance

Authentic PME writing reflects the student’s operational experiences, classroom discussions, and understanding of contemporary military challenges. Unless explicitly prompted otherwise, AI-generated content typically omits references to unit-level experience, recent training environments, or professional insights aligned with the Army Operating Concept.

As noted in number three, “Overly Polished but Generic Tone,” the lack of any expressed individual human-level experience, whether personal or professional, is telling and suggests the use of AI-generated text.

6. Incorrect or Fabricated Citations

AI-generated essays may contain citations that are either fabricated or incorrectly applied. Instructors reviewing such papers may encounter doctrinal misapplications or references to nonexistent sources (a.k.a. ghost citations). These errors are particularly noticeable in discussions involving doctrinal or historical publications.

Large language models pull their data from any source it can find online, whether primary or secondary, it cannot differentiate between the two. Sometimes it colligates several sources into one “ghost” citation. Additionally, it does not gauge the accuracy or legitimacy of the source, which often results in suspect citations to say the least. Lastly, PME encourages the use of sources contained within the curriculum. Papers that rely primarily on outside sources were probably produced by AI.

Often, instructors just glance over endnotes and bibliographies; now, they must examine these areas intently to identify any sources that clearly do not refer to what it references in the text, pull heavily from outside the curriculum, or are clear fabrications. For example, two New York lawyers were sanctioned for submitting a legal brief compiled by AI that included six fictitious case citations. Their incredulous response, “We made a good faith mistake in failing to believe that a piece of technology could be making up cases out of whole cloth.”5

7. Confidently Delivered but Factually Flawed Content

AI writing systems are trained to produce plausible-sounding text but can introduce errors with authoritative confidence. Essays may contain fabricated operations, misquoted doctrine, or ahistorical claims that, while convincing at first glance, cannot withstand doctrinal or historical scrutiny.

Derived from previously submitted papers, a common running joke within the halls of the Command and General Staff School is how a year ago the historical records suddenly credited the Iraqi army with defeating coalition forces on the beaches of Kuwait during the First Gulf War, thereby winning the conflict. AI-generated platforms have since corrected this error, providing a perfect example of how AI can latch on to historical errors that are then regurgitated as fact.

8. Formulaic Structure and Repetitive Transitions

Language models often rely on predictable templates. Repetitive transition phrases such as “In conclusion,” “It is important to note,” or “This illustrates …” may appear frequently. Additionally, a measured count of adjectives may now appear in each paragraph. Paragraphs will often follow a rigid structure that feels mechanical and lacks the dynamic organization typically found in well-developed student writing.

As discussed in number two, “Uncharacteristic Shifts in Style or Vocabulary,” if a student has previously struggled with structure, transition sentences, and previously used adjectives sparingly but their work is now suddenly saturated with them, it is probably another indicator that they used AI-generated text.

9. Misalignment with Prompt Details

AI-generated responses may answer the general topic of a writing prompt while failing to address specific requirements such as comparison of operations, critical evaluation of a doctrinal shift, or integration of course readings. Such misalignment suggests a lack of genuine engagement with the material.

Large language models rely on a succession of prompts to produce a quality essay (persona, task, context, format, and then continuous refinement is a common order). Students unfamiliar with this will simply copy and paste the essay question and then submit the initial response. Since plagiarism and now the use of AI can be linked to students facing a looming deadline, often the first response is as far as students will go in their use of AI, which is telling. However, this indicator will be invisible for those students who use a succession of refinement prompts to produce their essay. However, note that repeated prompts can elevate some of the other indicators such as ghost citations, a shift in vocabulary, and the dearth of a human voice.

10. Irrelevant or Out-of-Place Content

AI models sometimes include content that, while grammatically correct, is tangential or irrelevant. For example, an essay on mission command may unexpectedly include unrelated philosophical digressions or discussions of historical events not tied to the argument or learning objectives.

This directly links to number nine, “Misalignment with Prompt Details,” unguided AI essays will often contain “fluff” unrelated to the topic, especially when a prompt demands a specific page or word count. While instructors are used to seeing this in human-drafted work, AI “fluff” will be confidently written and will trick the reader regarding its relevance. As it takes a trained eye to notice when filler is being inserted to reach a set count, so it goes for this AI-generated indicator as well.

The use of large language civilian models to write an essay is difficult to prove, even with AI-detection software, and offers little to substantiate an allegation. Additionally, Army PME does not prohibit using AI to draft a paper if properly cited per individually established policies.6 This leaves the instructor as the final judge on the unassisted and individual developmental intent of PME while still encouraging the use and understanding of AI as a force multiplier.

One option presents itself to an instructor when they believe an uncited AI-generated paper has been submitted as the student’s original work. A “debrief” can occur where the student is asked a series of pointed questions about their essay. If the student genuinely developed the piece, they then should be able to defend and describe their thought process in assembling the work. If they are unable to do so, the instructor can initiate an integrity check on the student; directly asking them if they produced this work solely using a large language civilian model with zero credit given to the source. As a result of this inquiry, the instructor can then extend an opportunity to submit the assessment again, written this time without AI assistance. An instructor can help students understand the value of developing leaders of character capable of independent, creative, and critical thinking by holding a student accountable and incentivizing integrity with an opportunity to correct their mistakes.

Large language models used to develop written work will only continue to improve. Eventually, it will be impossible to detect their use in written assessments. Academia, and in turn PME, must accept this reality and determine new ways to evaluate a student’s individual development that uphold academic integrity while also still building agile and adaptive leaders who can think critically for themselves in stressful environments. There are two ways in which PME can adjust to this new certainty.

First, relying on written essays to evaluate student learning outcomes is obsolete. Instructors should be less interested in “catching” students using AI in a prohibited manner and more interested in crafting assessments that force students to use it productively. The focus should shift toward how a student can take a large dataset (the science of warfare), enter it into AI, guide it through a series of prompts to achieve a desired outcome, and then assess that output for clarity, accuracy, and usability. Finally, the student takes that output and inserts the art of warfare into the final product based on their years of knowledge and experience. This becomes especially important in the military decision-making process when attempting to better develop and inform the commander’s visualization of the operational environment in real time.

Second, robust written essays with future deadlines must be replaced by shorter written assignments and oral boards to truly test a student’s comprehension. Students must still be able to communicate in writing; however, this skill can be developed and assessed via shorter, timed on-the-spot written assessments. Additionally, absorption and grasp of the curriculum can be further evaluated via oral examination boards (either in person or virtually). Both techniques can accurately gauge the student’s understanding of the material and seek to place them in replicated stressful environments where immediate, succinct, and articulate responses, either in writing or verbally, can mean the difference between life and death on the battlefield. Ultimately, that is a goal of PME, which can no longer be assessed through traditional essay-writing requirements due to the rise of large language models.

As AI tools become more widespread within academic environments, it is imperative that Army PME instructors adapt their methods to preserve academic integrity and promote authentic, unassisted, and individual intellectual development. In the absence of reliable AI-detection software, this article outlined ten practical indicators that may signal the presence of AI-generated content in student writing and two new ways to evaluate student individual development. Familiarity with these indicators enables faculty to evaluate written submissions more critically, thereby reinforcing the cultivation of agile, adaptive leaders capable of navigating the complexities of future operational environments.

 

The author is appreciative of the feedback from multiple Command and General Staff School instructors in the drafting of this article: Mr. Dirk Blackdeer, Lt. Col. Charles Burkardt, Mr. Joseph Curtis, Mr. Thomas Goldner, Lt. Col. Scott Grimsey, Mr. Brian Hathaway, Lt. Col. Christopher Layton, Lt. Col. Nathan Lokker, Mr. Michael Mathews, Dr. Richard McConnell, Dr. Vincent Particini, Dr. Ross Pollack, Lt. Col. Andrew Scott, and Lt. Col. Andrew Whitford.

 


Notes External Disclaimer

  1. Academia, like professional military education (PME), is struggling to understand how large language models are changing the research and academic landscape as well as how it should be regulated. For some examples, see “How Much Research Is Being Written by Large Language Models,” Stanford University Human-Centered Artificial Intelligence, 13 May 2024, https://hai.stanford.edu/news/how-much-research-being-written-large-language-models; and Jonathan Shaw, “Artificial Intelligence in the Academy,” Harvard Magazine, 3 May 2024, https://www.harvardmagazine.com/2024/05/harvard-and-ai.
  2. “AI or Die” is a popular mantra in tech circles seen across traditional and social media platforms that tout the use of AI, most notable a popular podcast uses the phrase as its name: “AI or Die Podcast,” hosted by Rehgan Avon, AlignAI, 1 July 2025, https://www.getalignai.com/podcast.
  3. Response to “Can you provide ten ways in which an instructor can tell that a student used AI to write their paper?,” ChatGPT-4o, OpenAI, 30 June 2025. Interestingly, the author asked ChatGPT for ten indicators. The results provided formed the foundation to build upon for this article. The rest was realized through instructor intuition and experience. AI is a fascinating technology that tells on itself if prompted.
  4. Response to “Can you build me an AI Persona?,” ChatGPT-4o, OpenAI, 3 July 2025. If you ask AI to build you a persona, it responds asking for a purpose and role, desired personality traits, what knowledge and expertise it needs, any ethical constraints, and what its format and use should be. It will then generate a persona and give you a full profile with a name or role, backstory, a certain dialogue style and tone, and a knowledge base, all of which is guided by your set rules and limits.
  5. Sara Merken, “New York Lawyers Sanctioned for Using Fake ChatGPT Cases in Legal Brief,” Reuters, 26 June 2023, https://www.reuters.com/legal/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-2023-06-22/.
  6. It is important that students’ reference and follow their individual PME institution policies on the use of large language models to draft their written assignments.

 

Lt. Col. Patrick Naughton, U.S. Army Reserve, is a medical service corps officer and a military historian who is serving as an Advanced Operations Course instructor with the Department of Distance Education at the U.S. Army’s Command and General Staff College (CGSC). Naughton holds a Master of Military Arts and Science in history from CSGC, where he was an Art of War Scholar, and a Bachelor of Arts in history from the University of Nevada, Las Vegas. He is the author of Born from War: A Soldier’s Quest to Understand Vietnam, Iraq, and the Generational Impact of Conflict (Casemate, 2025).

 

Line of Departure

 

Back to Top

September-October 2025