AI content detection has been a major topic of discussion since the launch of ChatGPT, particularly in educational settings where it's frequently used by students to complete homework and assignments. This surge in AI-generated work has left many professors and teachers feeling frustrated as they struggle to maintain academic integrity.
Over the past few months, I’ve been tracking several of these AI content detectors. My focus was primarily on their ability to identify AI-generated content rather than plagiarism. The “industry standards” are GPTZero and Originality.ai, but several other tools, such as ZeroGPT, Quillbot, and Undetectable.ai, have also entered the fray.
With surprisingly minimal effort, I’ve managed to bypass the detection algorithms of all these tools.
In this article, I’ll explain how I did so and why the process isn’t as complicated as you might think.
Conflict of Interest
Each of these detection tools claims to be the "best" at identifying AI-generated content, but they also offer services to "humanize" that same content to help it evade detection. This presents a clear conflict of interest and they sell solutions to the very problem they claim to solve.
Tons of content has been produced comparing these tools , analyzing their accuracy and ranking them based on different factors. However, I always believed this is quite futile. Even if a tool excels in detection for a time, it won’t last long. Each new update to a language model forces these detectors to play catch-up, making it a perpetual cat-and-mouse game. Their effectiveness fluctuates, with many users reporting false positives and negatives, highlighting their unreliability.
The Paradox of AI-Detecting-AI
Most of these detection tools are AI-based themselves, which creates a paradox. Using AI to detect AI-generated content can never be 100% accurate, especially as language models continue to improve. This makes it increasingly difficult to distinguish between human and machine-produced text.
To understand why these tools struggle, it's essential to look under the hood at how they operate. Once you grasp the mechanics, bypassing their algorithms becomes much simpler. And that’s precisely what I’ve achieved, with consistent success.
How AI Content Detectors Work
AI detectors, such as GPTZero, rely on several key methods to identify AI-generated text. Here’s a breakdown of how they function:
1. Perplexity Analysis
Perplexity measures how "surprising" or unpredictable a piece of text is to a language model. Text that aligns closely with predictable patterns might be flagged as AI-generated, while more nuanced and unpredictable language often indicates human authorship.
2. “Burstiness”
Burstiness refers to variations in sentence length and structure. Human writers naturally vary their sentence structures, creating a dynamic flow. AI-generated text can be more uniform, lacking this variation, and detectors use this pattern to identify possible AI origins.
3. Repetition and Redundancy
AI models may inadvertently repeat phrases or ideas, a sign that detectors can pick up on. By analyzing repetitive elements, detectors can identify AI-generated content that lacks genuine variation.
4. N-gram Analysis
N-grams are sequences of words (bigrams for two-word sequences, trigrams for three, etc.). Detectors analyze these to check how often certain phrases appear. Overly predictable n-grams often signal AI authorship, as models rely on common word pairings.
5. Style and Tone
AI-generated writing can sometimes feel too consistent or polished. Detectors look for stylistic markers that might suggest a lack of nuance or creativity, elements that typically characterize human writing.
6. Training Model Comparisons
Some advanced detectors compare the input text against a database of known AI-generated content. They can also run the text through similar models to see if they produce matching outputs, which would suggest AI authorship.
7. Contextual Understanding
Detectors analyze coherence and logical flow. AI-generated text can be semantically accurate yet lack deeper contextual understanding, creating subtle logical gaps. Detectors flag content that is technically correct but contextually off.
By combining these approaches, AI detectors like GPTZero and Originality.ai attempt to distinguish between human and AI-generated content. But understanding how these tools work makes it much easier to outsmart them.
AI Detectors, Defeated
Armed with this knowledge, I’ve found it surprisingly simple to bypass their algorithms. By using a considerably simple prompt (just 11 lines long!), I managed to produce content that bypasses their detection mechanisms. The result was AI-generated text that passes as 100% human-grade content according to these tools.
Here is an example of some ChatGPT generated text describing global warming. Originality.ai successfully detects it as 99% AI:
I then re-generated the text through my prompt and this it’s now 100% human content (!!):
I then ran the same test on GPTZero with the original text from ChatGPT, not surprisingly it was successful in detecting it as AI generated albeit with less confidence than Originality.ai, at just 76% probability:
I then gave it the re-generated text from my prompt and surprise, it’s suddenly 98% human:
If you’re curious to try it for yourself, I made it available for free on the GPT store.
Simply paste any AI-generated content, and the tool will modify it to appear more human. You can then verify the results by running the output through your preferred AI detection tool.
The quest to detect AI-generated content is like trying to hit a moving target. Every advance in language models pushes detection tools to adapt, creating a never-ending game of cat and mouse.
PS: If you’d like to see the prompt, feel free to subscribe to my sub and you’ll get a link which contains it, as well as all the other prompts from my previous posts!
I never received the prompt upon subscribing. How can I get access to this prompt? Please let me know, thank you!