Perplexity and Burstiness in AI and Human Writing: Two Important Concepts
The original article was written by The Jasper AI Whisperer on Medium.com. For ease of access below is the article.
Check all AI results for accuracy, fairness, and potential harm. Ultimately, it’s human oversight that safeguards responsible use.
Perplexity
Perplexity is a measure used to evaluate the performance of language models. It refers to how well the model is able to predict the next word in a sequence of words. As you’ll probably know by now, AI-generated text is procedurally generated; i.e. word-by-word. AI selects the next probable word in a sentence from a K-number of weighted options in the sample.
Perplexity is based on the concept of entropy, which is the amount of chaos or randomness in a system. So a lower perplexity score indicates that the language model is better at calculating the next word that is likely to occur in a given sequence, while a higher perplexity score indicates that the model is less accurate. Basically, the lower the perplexity, the more predictable it is. This indicates better generalization and performance.
As a really rough example, how do you think should this sentence end?
“I picked up the kids and dropped them off at…”
A language model with high perplexity might propose “icicle”, “pensive”, or “luminous” as answers. Those words don’t make sense; it’s word salad.
Somewhere in the middle might be “the President’s birthday party”. It’s highly unlikely but… I guess it might be plausible, on rare occasions?
But a language model with low perplexity might answer “school” or “the pool”. That’s an accurate, correct prediction of what likely comes next 🔮
As you can see, there are varying degrees of plausibility in the output.
(BTW, note how by accurately predicting language, AI gives the appearance — erroneously — of being factually accurate as well. Don’t fall for this fallacy:
Perplexity is commonly used in NLP tasks such as speech recognition, machine translation, and text generation, where the most predictable option is usually the correct answer. For writing generic content that’s intended to be standard or ordinary, lower perplexity is the safest bet.
Face it: most of the time, what we humans say and write is usually pretty boring! It’s easy to calculate what word comes… [next? after? tomato?]
Burstiness
Burstiness basically measures how predictable a piece of content is by the homogeneity of the length and structure of sentences throughout the text. In some ways, burstiness is to sentences what perplexity is to words.
Whereas perplexity is the randomness or complexity of the word usage, burstiness is the variance of the sentences: their lengths, structures, and tempos. Real people tend to write in bursts and lulls— we naturally switch things up and write long sentences or short ones; we might get interested in a topic and run on, propelled by our own verbal momentum. Like I did^
AI is more robotic: uniform and regular. It has a steady tempo, compared to our creative spontaneity. We humans get carried away and improvise; that’s what captures the reader’s attention and encourages them to keep reading.
How can I tell if a text was generated by an AI?
There are standard measures of burstiness and perplexity that are commonly used in natural language processing (NLP) and machine learning. To calculate these measures, you would need to use a natural language processing tool or library that can compute them. But human intuition will work in a pinch. Analyze the varying sentence structures, and count the number of unique words in a sentence divided by the total words.
You can finally put your college degree in literature to use! Judge writing. Is it interesting? Does it meander or stay on a topic too much? Are there any interesting words—or ones that seem out of place? These are all questions you can use to evaluate the perplexity and burstiness in a piece of writing.
There are also AI-based content analyzers like Originality.ai and GPTZero, if you’d like a more accurate assessment. It’s like the Space Race: pitching language model algorithms against each other in an escalating Cold War.
It’s important to remember that while AI-generated content can lack the variability of human content, that doesn’t mean that AI-generated text is completely devoid of entertainment. There are many examples of people producing unique, exciting content with AI. And if I’m perfectly honest, I find Ernest Hemmingway’s writing to be low in perplexity and burstiness!