…her speech is nothing,
Yet the unshaped use of it doth move
The hearers to collection; they aim at it,
And botch the words up fit to their own thoughts
– “Hamlet”, William Shakespare
This site, Do The Words, is for people whose writing helps others do their jobs. Particularly in the context of leading or supervising. As a leader, you often have to communicate through messages, documents, and presentations, and your success in communication depends on how you write these things. It’s hard: a presentation that’s just a collection of concepts won’t change what people do. A document that’s hard to read will sit in storage, unused. A message that doesn’t show prior listening will make its recipients frustrated or confused.
Recently, we hope and worry: Will AI assistants help relieve the pain of writing? Will robot writers replace part of what we’re paid for? (Easier work may mean fewer jobs.)
If your writing clearly helps people decide and act, and your role is recognized, there’s good and bad news:
- The good news is that thoughtful writing remains a human thing (evidence and argument later). If thought is valued in your job, you'll still be required to write (though some data representation tasks will be automated).
- The bad news is also that you'll still be required to write. AI won’t relieve the pain of writing well, and might in fact distract us from good writing.
In this post, I explain why I think so. To be clear, I'm looking specifically at AI putting words on a page. Not AI in other background tasks such as research and summarization, which writers also need to do.
Let's look at what AI writing can do through the lens of our encounters with it, backed up by references on how it actually works.
How did many people encounter large language models?
For a while now, some sports and financial reports have been “written” by machines. If you need a feed of stock updates or match results, programmatic rules can generate language to wrap them in – nothing fancy or descriptive, no great insights, but a functional account of the data that changed, in regular human-readable paragraphs.
That’s not really AI as most people know it, though. The general sense of AI is some machine cleverness that goes beyond simple rules, most typically deep learning. The best known form of deep-learning-driven AI writing cleverness (at the time of writing) is the GPT family of large language model (LLM) applications. Like the earlier programmatic approaches, LLMs can also help greatly in representing data. However, their initial relationship with facts has been troubled.
Through ChatGPT, many of us marvelled at the circus tricks of a human-esque assistant who patiently answered our questions and facetious prompts with a flow of language, some quite surprisingly on-topic.
If we had to do serious work with it, though, it could be surprisingly wrong. It would confidently quote reference sources for its information that were entirely made up. Sources with plausible but fake titles, and even made-up URLs. That’s because instead of really understanding what it was saying, the model was trying to construct plausible sequences of words. This thread gives a simplified example of how that worked:
This way of making things up is sometimes derogatively called “spicy autocorrect”, but in fact it’s more like making plausible substitutions in a multidimensional game of Mad Libs. See for example this “could actually work in some parallel universe” medical application of a snack!
“Multidimensional game of Mad Libs” is the best simple explanation I can manage largely because of my own developing understanding of the domain. But if you'd like a very detailed explanation of what’s going on, the best I've found is this one by physicist, computer scientist, mathematician and entrepreneur Stephen Wolfram:
What data tasks can AI assistants really help with? A more promising application is for some tedious data manipulation tasks, which work well with the GPT tools’ readiness to swap around information. If you need to transform some simple data from one format to another, they’re quite good, though of course you should check the result. In this respect they're a bit like a human writer who's cruising through a tedious task – like me when I'm copying & pasting while listening to dance music. Here's an example:
They currently struggle with longer or more complex data, and although the results may look OK, the occasional serious mistakes demand effort in checking everything, as Dean Allemang discovered:
As the models are trained on more and more data, some kinds of mistakes become statistically less likely. But they lack the ability to reason about the world, to understand concepts and relationships, so on their own they will always make some logical mistakes. But the plausibility of the output means the mistakes, expressed with full confidence, are not always easy to catch. The models lack a confidence level, again because they have no idea of what it means to be correct – they are simply generating patterns of words.
This isn't a secret by the way – the fine print in LLM-based tools lists pretty much the same limitations as Gary Marcus discussed in Deep Learning is Hitting a Wall.
The symbolic gap
In that article Marcus talks about a symbolic approach being needed to complement the deep learning approach that’s taken by the large language models, for example the GPT family. Symbolic meaning that bits of computer code correspond to things and relationships in the real world, and it’s more likely that humans can understand and work with the relationships between those things (often with programming languages, but also with visual or other friendlier computer interactions such as things on a screen that you can move around).
How will AI-generated writing benefit from a symbolic approach? Briefly, the ways that could make for better “AI writing about data” are:
- When people can build rules, models, or patterns around data using symbolic approaches, they can drive GPT tools more prescriptively, leveraging those tools' ability to produce human-like language around smaller, most precisely prompts chunks of data. Those symbolic structures can also help in checking what comes out of the AI tools.
- Something currently unexplainable about LLMs, coupled with a longstanding puzzle about “instinctual language abilities” suggests that we may be close to being able understand in natural language a computer's representation of data. Giving the ability to interact and ask questions and get correct (from a pure logical standpoint) answers. See the addendum to this post for why that might be.
Can AI writing help humans to act in complex environments?
Above were some thoughts on how AI can help with communicating data. But effective communication at work goes way beyond just transmitting raw facts in this way. Let’s take one of the most prosaic, factual forms of information – procedural guidance, for example a description of how to work a machine. Someone without much experience of this kind of writing might just write down a list of steps. You could imagine those steps being generated from some kind of data about the machine and its control. But if all the reader gets is the steps, they’ll likely have a great many questions. What happens when something goes wrong? What do I need to know before starting, about my environment and any resources I’ll need? What if this procedure is part of a number of things I need to do to get this machine doing what I need, in my particular case? How do those things fit together?
Supporting users' decisions and troubleshooting takes understanding of not only the machine itself but the user’s context – where they are, what they know, who they are, what they’re trying to do, and how they best learn new things. Can robot writers do that?
The current AI tools lack such context. But even with deep learning and symbolic AI tools combined, it’s likely that the most effective technical writing (or video presentation, or diagram, etc) will still need a human to help shape it.
Can AI writing keep your attention and challenge your thinking?
No matter how theoretically well information is represented and sequenced, that’s no good if the audience can’t stay awake. As I wrote a few months ago, sometimes language needs to challenge and surprise readers. We’re not automata; we’re humans who need to feel we’re talking with another intelligent and somewhat interesting human. Quirks and even clever infringements of language rules make us happy and keep us focused. Can robot writers do that?
I think their efforts would be really clumsy, like intercalating contingent synonyms in an otherwise bland sentence (like I just did). Random obscure words aren’t going to play with readers’ minds the way that Shakespeare did and we can aspire to.
A father, and a gracious aged man … have you madded
– “King Lear”, William Shakespeare
From my experiments, Chat GPT hasn’t yet come up with language that I felt expressed what I wanted to say, the way I wanted to say it. Some writers I follow feel the same way; here’s an example:
Can AI writing give your human writing a good start?
What a lot of people say is that while AI writing tools might not give you the final output you want, they could help you get started. Perhaps like an assistant who helps shape some ideas before you do the proper writing. Let’s look at that. Here’s what the GPT-based AI assistant in Notion wrote to continue an earlier draft of this post:
In the realm of unpredictable writing, machines will not be able to replace humans any time soon. Writing that involves storytelling, creativity, and emotional impact requires a level of human intuition and understanding that machines cannot replicate. While machines may try to generate stories or content, they will always lack the human touch that makes writing truly engaging and impactful. So, while machines may be able to handle more formulaic writing tasks, there will always be a need for human writing in areas that require creativity and emotional impact.
It nears but evades the points I want to make. It’s repetitive, and says things that sound deep but don’t show actual reasoning. I can't argue for or against these statements – there’s not enough to latch onto.
It reminds me above all of a writer who doesn’t really understand the domain but quotes and remixes bits of what they’ve heard in order to sound convincing. It sounds, for lack of a better word, like a bullshitter. And in that sense it’s really human-like, just not the kind of human writer who I can work with well.
This isn’t me being snobby. In my career I’ve tried many times to improve on writing that is essentially fluff like this, and I find it extremely hard. If there’s no sense of the domain, no sense of an argument waiting to be drawn out, however indirectly or poorly expressed, if there is no real substance there at all, it’s really hard to build on.
I talked about this style of writing in “Do you want your writing to “be” something or to do something?”
Some incurious writers do seem to have committed themselves to be, not do. Asking them to clarify meaning will produce a blank face or an evasion, “it’s just to sound good”, or “it's only for social media”. As they acquire the habit of vague, superficial writing, they can no longer produce work that does something to its audience and in the process challenges their own thinking.
I’ve tried to edit academic work that circles in generalities and half-proofs. Short of starting anew, all I can do is polish up the phrasing. I’ve tried to start writing from a foundation of bland corporate marketing-speak. Hard too, and easier again to start from scratch. It looks as if people writing in this way are paraphrasing and copy-pasting – close to the “stochastic parrot” or “mad libs” approach that GPT has been described as.
Strategist and economist Patrick Chovanec likes planes and writes well, so I follow him on Twitter. He finds GPT-based writing a poor starting point:
How about less confident writers, or speakers of other languages? People sometimes say that AI assistance is the only way for them to communicate effectively. In my extensive experience with speakers of other languages, that’s not really the case – at least at the stage of building and connecting ideas, such writers can be very effective when they use simple sentences. (Sentences that native speakers often prefer to read over verbose jargon anyway.) Without the solid foundation that working out ideas in the target language gives, it would be hard for a writer to check and work with any AI-generated text. Machine translation is a different area with its own merits – but when we are talking about originating ideas in the target language, human writing works best.
(Actually, for all writers: confident or not, native users of the language they're writing or not, human speaking is a great way to start – see Peter Elbow’s "Vernacular Eloquence”.)
Do human writers resist robot help through fear or habit?
Although above I’ve quoted a couple of people active in the tech world, you might wonder if there is some kind of traditionalist resistance to GPT – a kind of prejudice that more forward-looking people lack. Anthropologist Henrik Karlsson does use GPT to help with some writing tasks, but doesn’t like to take it too far:
Entrepreneur and investor Paul Graham is very “tech-positive”, does favor AI writing assistance for some tasks, but also feels that over-reliance on AI assistance will make people think less:
This is the real downside of AI writing assistance. It not only can’t help us think; it seems to get in the way of thinking. After all, wrestling with ideas in writing is a wonderful way to clarify our thinking, as I’ve discussed previously. Graham again:
Can LLMs help with any kind of thinking at work?
Perhaps just the area of very high-level strategy? After all, these language models must have ingested a great many strategy documents; perhaps this latent knowledge can help us?
It doesn’t look good for this either. A real strategy is designed to make the most of a challenging situation at a point in time for a given organization. It is highly specific, so the bland, safe-sounding strategy recommendations that can be prompted from large language models are unlikely to work much.
In fact, cynically, some people feel that GPT tools can replace upper management simply because they see upper management’s task as producing communication that looks the part but lacks knowledge and insight:
What’s the best way for companies to avoid the influence of plausible BS?
One effective antidote is to encourage a culture of writing – human writing, that is. Amazon used it to great effect for many years. Highly experienced “big tech” product leader Shreyas Doshi also recommends such a culture:
Of course, thinking through writing can be hard. In the process of figuring out what to say, we may get temporarily blocked. But the better result, for us and our audience, is nearly always worth it. If we're tempted to take shortcuts, we should remember that the habit of honest thinking is easy to lose.
That was a lot of words; what's the conclusion?
When it comes to actually putting words on a page (not research or summarization), large language models:
- can save a lot of your time converting simple text-based data to different formats.
- where accuracy is important, can help narrate simple data in human-like language. They may also pave the way to simpler, more cost-effective, and environmentally less impactful ways to create text descriptions of data (current LLMs use a lot of computing power). There will be more automated journalism and some automated technical documentation.
- however, they lack the contextual model to communicate more complex information, for example to enable people to make decisions or plan actions at work.
- lack the ability to tweak language to most effectively keep people interested and engaged. They could simulate it, or be carefully prompted to do so, but results are likely to look clumsy.
- look as if they should be able to give experienced writers a good start in terms of ideas to further edit. But the ideas tend to be generic, uninspiring, and undeveloped, thus providing a very weak basis for building arguments.
- aren’t the best start for non-native speakers either. Those writers might want to start in their native language first and machine translate later – that can work OK. Or they may find it useful to try summarizing the points they’ve made using LLMs later. But starting to write from a basis of LLM output doesn’t seem to truly help, not if good thinking and clear communication is the goal.
- resemble those humans who write plausible words without understanding or helping the real issues.
To think clearly, the human writing process helps tremendously. A culture of good writing can also counteract the tendency for us writers at work (particularly managers!) to fall back on recycled ideas and fluffy pseudo-speak.
So human writing is still very much needed, at least wherever clear thought and good directional communication is valued.
Why should human readers need human writing instead of ingesting pure data?
Is this because as humans we’re inherently weak? Do we need story formats, linear information flows, instead of ingesting data in parallel, because of our habits or biological limitations? Life is a timeline after all, each life with a start and finish.
Or is it because we are something beyond data? That far from being human stochastic parrots as some over-thinkers say, our essence is something creative and ever-changing that could never be bottled up in a machine?
I think it’s a bit of both. As humans we are weak, but we have glimpses of a nature beyond weaknesses. Machines don't see as such, nor do they die. To mold ourselves in their image would miss a lot of what it is to truly live.
They make helpful and sometimes amusing assistants, though.
How could LLMs pave the way to more direct human-computer interaction?
The impressive language ability of the large language models suggests that we might be close to better understanding the complex underlying rules of real human languages. If languages were really as simple as the grammar we learn at school, it would be much easier to generate human-friendly text directly from data. But for a long time, languages have seemed to be far more complex than that, with many subtle exceptions and changes. In a very detailed article on the workings of LLMs, Stephen Wolfram notes that even though they are being trained on masses of data, that is still not enough data in theory to account for their success in human-like output that follows very complex rules. He writes:
is there a general way to tell if a sentence is meaningful? There’s no traditional overall theory for that. But it’s something that one can think of ChatGPT as having implicitly “developed a theory for” after being trained with billions of (presumably meaningful) sentences from the web, etc.
What might this theory be like? Well, there’s one tiny corner that’s basically been known for two millennia, and that’s logic. And certainly in the syllogistic form in which Aristotle discovered it, logic is basically a way of saying that sentences that follow certain patterns are reasonable, while others are not. Thus, for example, it’s reasonable to say “All X are Y. This is not Y, so it’s not an X” (as in “All fishes are blue. This is not blue, so it’s not a fish.”). And just as one can somewhat whimsically imagine that Aristotle discovered syllogistic logic by going (“machine-learning-style”) through lots of examples of rhetoric, so too one can imagine that in the training of ChatGPT it will have been able to “discover syllogistic logic” by looking at lots of text on the web, etc. (And, yes, while one can therefore expect ChatGPT to produce text that contains “correct inferences” based on things like syllogistic logic, it’s a quite different story when it comes to more sophisticated formal logic—and I think one can expect it to fail here for the same kind of reasons it fails in parenthesis matching.)
But beyond the narrow example of logic, what can be said about how to systematically construct (or recognize) even plausibly meaningful text? Yes, there are things like Mad Libs that use very specific “phrasal templates”. But somehow ChatGPT implicitly has a much more general way to do it. And perhaps there’s nothing to be said about how it can be done beyond “somehow it happens when you have 175 billion neural net weights”. But I strongly suspect that there’s a much simpler and stronger story.
… my strong suspicion is that the success of ChatGPT implicitly reveals an important “scientific” fact: that there’s actually a lot more structure and simplicity to meaningful human language than we ever knew—and that in the end there may be even fairly simple rules that describe how such language can be put together.
Therefore perhaps the underlying rules are a bit simpler than people used to think. They could be used to build smaller, perhaps symbolic models of languages and data could be transformed into language more accurately, effectively, and in fact cheaply (money-wise and in terms of computing cost, hence environmental impact too).
This idea of a somehow simpler model of language that has escaped description so far, reminds me of Stephen Pinker’s The Language Instinct, where he proposed that young children never heard enough language to account for their ability to speak in completely new sentences following complex rules. And thus there must be some kind of biological “language instinct” powering our use of words. But, if as suggested by the case of the LLMs, the underlying rules are somehow simpler than previously thought – perhaps there’s no need for an innate instinct to justify our human abilities to pick up and use language very quickly.
The fact that both Wolfram and Pinker concluded based on very different grounds that the underlying rules of language are simple though as yet undescribed, leads me to think that Wolfram's onto something interesting.
For completeness and precision, here’s Wolfram’s own account of the possible future applications (clearly more abstract and engineer-like than my waffly “humans could talk better to machines”!)
So what would happen if we applied ChatGPT to underlying computational language? The computational language can describe what’s possible. But what can still be added is a sense of “what’s popular”—based for example on reading all that content on the web. But then—underneath—operating with computational language means that something like ChatGPT has immediate and fundamental access to what amount to ultimate tools for making use of potentially irreducible computations. And that makes it a system that can not only “generate reasonable text”, but can expect to work out whatever can be worked out about whether that text actually makes “correct” statements about the world—or whatever it’s supposed to be talking about.