Like everyone else, I’ve been watching the rise of “generative AI” with both interest and trepidation. (“Generative AI” is software that creates “new” text (ChatGPT) or images (DALL-E) from a user prompt – I’ll explain the quotes on “new” later.) Now, I know only a smattering about how generative AI works, so don’t expect technical insights here. But I’ve noticed an interesting gap between what I think these systems are doing and how people are reacting to them.
My interest in generative AI, especially text generators, is easily explained and probably obvious. Since I was in high school I’ve watched software get very slowly better at imitating the kind of writing humans do with great effort, and the kind of conversational interaction that humans do without a second thought.* The latest round is, superficially, really impressive: it can chatter pleasantly about nothing much, write a poem, program in R,** write an essay about Canadian history, explain linkage disequilibrium, and more. Or at least, it often looks like it can.
That last bit is what I find particularly interesting. You can google plenty of examples of folks asking ChatGPT something and either being impressed or disappointed by the results. Their reactions can teach us something, I think. In the last few days I’ve seen two examples close to my own interests, both courtesy of Jeff Ollerton. First, Jeff followed up on my post from last week on rounding numbers by asking ChatGPT how to do it. ChatGPT’s answer was smooth, polished, and repeatedly wrong. Then, Jeff posted more about ChatGPT yesterday, based on his asking the software “What did Erasmus Darwin say about birds visiting flowers?” ChatGPT’s answer was interesting: a sentence or two about Erasumus Darwin, and then a quote from a poem called “The Loves of Plants”. That’s a real poem that Darwin wrote – but the quote, Jeff discovered, was completely made up.
Both examples, I think, point directly at what one can, and cannot, expect from the current generation of text-generative AI, and ChatGPT in particular. ChatGPT doesn’t “know” anything about the answer to your question, whatever it might be. What it does know is what other people have written in texts involving the words that appear in your question – and the style in which they have done so. It’s a very fancy version of the autofill in your cell phone’s texting app. If you’re lucky, many other people have written correct answers to the question you’re asking, and so you’ll get a version of that. But if not – if nobody has written an answer, or people have disagreed about what it might be – then you’ll get plausible sounding text that has little to do with the real answer to your question. This seems to surprise some folks, but I think it’s exactly what you should expect, if you realize that ChatGPT isn’t answering your question, it’s stringing together words and phrases that often appear in proximity to the words in your question. And, of course, mimicking the kind of style in which those word and phrases are written.
The problem (if you’re really trying to learn something from ChatGPT’s answers) is knowing whether you got a correct answer or a plausible and confident incorrect one. And this is tricky, because ChatGPT is impressively skilled at being plausible and confident (with its degree of confidence poorly correlated with its degree of correctness). And this is where Jeff’s post is very clever: he tested ChatGPT on something relatively obscure but which he knew well. And from ChatGPT’s performance on that, he inferred something about its trustworthiness in general.
Fact-checking the familiar is something we should do all the time. When I’m doing a new kind of statistical analysis (or an old one using new software), I always start with a small fake dataset for which I know the correct answer. Once I see that the analysis performs the way I expect it to, I can confront it with real data and trust the result. When I’m reading a book or a newspaper story, I look for things I know and ask whether the writer got them right. (Just to show you how very weird I am, not once but twice I’ve blogged about biographies that make mistaken claims that person X named a species after himself. I have weird expertise.) Jeff’s post about ChatGPT demonstrates this approach nicely – and I think doesn’t just show that ChatGPT can often be wrong, but (together with many similar attempts you can find with a quick search) it helps establish the kind of wrong ChatGPT is likely to be, and thus what it’s really doing under the hood.***
The kind of wrong ChatGPT is is, I think, relevant to the will-students-use-it-to-cheat worry. I teach scientific writing. Am I worried about students using ChatGPT to write their assignments? Not yet (I think). My prediction is that, asked to write a scientific paper, ChatGPT will write something that sounds a lot like our current scientific literature – that is, something that sounds science-y. The text will use primarily, or only, the passive voice. It will be festooned with acronyms. It will have long, complex sentences that need to be read three times to be decrypted. What else would you expect, given that ChatGPT doesn’t know anything – it only knows what other people have written about things? In a way, ChatGPT will have (I predict) the same problem many beginning scientific writers do: it “learns” to write scientific papers by “reading” other ones and imitating what it sees. Given the state of (most of) our scientific literature, this is a good way to produce poor writing. So: if a student hands in a ChatGPT draft, I predict it won’t fare well in grading. If they start with a ChatGPT draft and fix it, I think they’ll be demonstrating that they’ve learned what I have to teach.****
Will generative AI improve? Definitely. But it won’t be writing my papers. Or my blog posts. Lots of articles about ChatGPT wait until the end, and then reveal that the author didn’t actually write the article – ChatGPT did. Not this time. For better or worse, you’re stuck with me.
© Stephen Heard January 10, 2023
UPDATE: If you’re intrigued by my take on AI, then you’ll really like this podcast with Ezra Klein and Gary Marcus. Marcus knows his stuff (and his take is similar enough to mine that I’m reassured). I think I’m glad I found this only after writing my post; it’s so much better it would have dissuaded me from writing anything!
Image: another good example of how little ChatGPT “knows”. While I could be flattered, even without the vague sycophancy there’s a giveaway: this is the second prompt-response I gave ChatGPT about me. The first was simply “Who is Stephen Heard?”, and the answer was “I’m sorry, but I am unable to find any information about a person named Stephen Heard”. On balance I think I find this comforting.
*^Although in my case, not particularly well. Small talk is not one of my strong points, as you’ll know if you’ve ever bumped into me cowering in a corner at a party.
**^Of course, being able to program in R without googling Stack Overflow is an excellent way for a computer to fail a Turing test.
***^Well, approximately. One of the fascinating things about AI models, neural nets, and the rest is that not even the people who invented them really know how they work. How cool (and perhaps troubling) is that?
****^Remind me to re-read this post in a few years and see if I still think this way.
“ ChatGPT is impressively skilled at being plausible and confident (with its degree of confidence poorly correlated with its degree of correctness).”
Just as in science and other walks of life with humans: the loudest and most confident voice in the room often isn’t the most knowledgeable. Spooky stuff. And fascinating.
LikeLiked by 3 people
A student from Princeton wrote an alternative AI algorithm (GPTZero) to detect whether something was in fact written by ChatGPT or not. The fascinating part of this is a potential arms race if the people behind ChatGPT in turn would use the GPTZero results to optimise ChatGPT’s algorithm to avoid detection. If Zero uses Chat’s new output as input we get to vicious circle where both algorithms increase each other’s accuracy…
LikeLike
Thanks for the post and to Jeff Ollerton for doing a test. So, I did what you did in reverse: I first prompted ChatGPT to introduce me and got the same sort of sycophantic response you got (well, it didn’t say “long list of accolades”). Then I asked “Who is Arne Mooers?” …and got “I apologize, it seems that I was mistaken in my previous response. I am not aware of any evolutionary ecologist named “Arne Mooers” working in Canada or in general…”
So, we are still safe.
LikeLike
LOL, we are (in ChatGPT’s eyes) interchangeable!
LikeLiked by 1 person
“The text will use primarily, or only, the passive voice. It will be festooned with acronyms. It will have long, complex sentences that need to be read three times to be decrypted. ”
In other words a close approximation to the thesis chapters copied and pasted into the submission software of journals, and then inflicted on me to review.
LikeLike
In our department a student has already tried to pass off an epigenetics essay written by ChatGPT as their own work. Got caught out with the GPTzero software. This could become a real headache for academics.
LikeLiked by 1 person
I share your, and Jeff Ollerton’s, impression that ChatGPT is best described as a bullshitter in Frankfurt’s sense. One way you can tell is its tendency to invent stuff–citations as well as quotations. It also seems to repeat common mistakes. For instance, if you ask it to define “p-value”, it will often get it wrong, by offering up some common-yet-mistaken definition.
Like you, I’m impressed by how good it is in many contexts, and also not *that* worried (yet!) about students using it to cheat.
LikeLiked by 1 person
Pingback: If I Read Another Article About ChatAI - Language Lover
I’ve tried ChatGPT as well and published several blog posts. In my latest rant, I linked to your post to give additional support for my rant that in its current state ChatGPT will not replace writers (and it doesn’t “chat” well either.)
LikeLike
I decided to follow your example and the result was quite funny:
Who is the ecologist Pavel Dodonov?
Pavel Dodonov is a Russian ecologist and conservationist. He is the founder and director of the Russian Ecological Society, a non-governmental organization that works to protect the environment and promote sustainable development in Russia. He is also a professor at Moscow State University and has authored numerous books and articles on ecology and conservation.
LikeLike
Congratulations! 🙂
LikeLike
Turns out I was using a different AI, not chatGPT, but it was well worth the laughs 🙂
LikeLike
Pingback: Recommended reads #206 | Small Pond Science
Pingback: ChatGPT: author, acknowledgement, method, or tool? | Scientist Sees Squirrel