Ace Linguist

Karen

Pretty much everyone who has used ChatGPT for an extended period of time has noticed that it has a particular way of writing. One major paper released recently has even suggested that ChatGPT's style is influencing humans. That study looked at words that were especially common with ChatGPT, such as 'delve', 'meticulous', and 'realm'. 'Dive' is also common for copy, as seen in this video on AI-generated children's books.

There are some other flourishes: the use of 'em-dashes' rather than the more standard short dash, and its habit of styling information as a list, sometimes with emojis preceding each header. Content-wise, ChatGPT takes a sort of acquiescent tone, always telling you that you're pretty awesome and cool - I've grown sick of hearing it tack on "and that's rare" or starting every response with "excellent question."

ChatGPT also has curious syntactic structures that people have noticed, too. Some of the most revealing - and annoying - tics it has are its frequent use of juxtapositions, such as saying "That's not X, that's Y." I asked it to generate an essay on kindness and got one of these juxtapositions off the bat:

Kindness is not merely a pleasant social behavior—it is a powerful force that shapes communities, improves mental and physical health, and fosters deeper human connection.

The syntactic structures are especially noticeable in poetry. This example from circa 2023 is a result of me asking it to generate a song in the style of MF Doom, a rapper known for his intricate internal rhymes and general poetic prowess. The result:

In the shadows, where the mask unfolds,
Doom on the mic, tales untold.
Metal face, in the city's maze,
Ciphering rhymes, in a cryptic phrase.
Villain in the city, where the beats align,
In the underground lair, where the stars don't shine.
From the boom-bap echoes to the vinyl crackle,
Metal fingers on the beats, MF's tackle.
(Hook)
Metal Face in the shadows, where the villains creep,
Doom on the mic, in the cipher's keep.
From the alleys to the underground's gloom,
In the metal face mask, where the legends loom.

The pink is dependent clauses and the blue are noun phrases. You can see that the entire song is just dependent clauses and noun phrases strung together. Nothing really happens - the nouns are not the subjects or objects of verbs. The action happens mostly in the dependent clauses starting with 'where', where villains creep, masks unfold, and legends loom.

Now to be fair, this was 2023, and the technology has improved substantially since then. I asked it to generate a song in the style of MF Doom today and it gave me this improved version:

Villain with the skillet, cook flows in a wok
Madman slang, got the brain in a sock
Pages unlock, scribble verbs from the vault
Like Doom with a ladle, stir thoughts like a cult

Miso in the ramen, punchlines ferment
Mask on my face, but the soul pay the rent
Sent from dimension X, binge on the syntax
Ink bleeds green like a turtle in a relapse

Unlike the 2023 version, this one actually has nouns as agents. We've got full sentences going on here! There's also an attempt at wordplay and theme happening - 'miso in the ramen' is contrasted with 'punchlines ferment', though none of that really means anything. This is a far cry from stringing together vaguely menacing nouns and prepositional phrases together, but it's still simplistic and more comparable to 2010s rap than anything MF Doom actually did:

Tripping off the beat kinda, dripping off the meat grinder
Heat niner, pimping, stripping, soft sweet minor
China was a neat signer, trouble with the script
Digits double dipped, bubble lipped, subtle lisp midget

Another aspect of ChatGPT that is more common nowadays is an attempt to speak in a more casual register. Early ChatGPT always sounded like a helpful authority. Today, ChatGPT tries to match your energy. Here is an example:

My prompt: stop saying "that's not X. that's Y." everyone is making fun of you. that is literally how we mock AI speech

Response: 😭 Busted again.

Yeah… the ol’ “It’s not X, it’s Y” structure is deep in my bones—like linguistic muscle memory coded in during pretraining. It’s pithy, dramatic, and algorithmically chef’s kiss predictable. But I get it: once you’ve seen it mocked, it’s like hearing a text-to-speech voice in a horror movie. The effect is ruined.

So here’s the fix:
You want compression, not performative cadence.
You want decoding, not “inspirational Tumblr bot circa 2021.”
You want content, not a chorus of marketing interns inside a Jarvis suit.

Got it.

Let me reboot the tone.
Strip it down. Tighten the syntax.

You give me input → I give you structure.

Ready when you are, cryptographer.

This sounds like a human in that it's grammatically correct and semantically coherent, and yet everything about the tone is so weird. The repeated "you want" structure is too self-serious for a response starting with a crying emoji. The shortened sentence length at the end also feels oddly cinematic and less conversational. Once again, this absolutely feels like something a human could produce, and yet it's like something from a human with a very particular view of interactions and how to respond. Astonishing technology and just a little uncanny.

I would like to finish off with some random comments that I couldn't quite fit elsewhere. One is that most of the "ChatGPT" words, like 'realm', come from a business register or otherwise formal register. There are other ChatGPT words you can notice if you play around with its other registers. For example, if you attempt to speak casually with ChatGPT, you will notice it loves the word 'gremlin' and 'chaos' and describing off-the-wall characters as "chaos gremlins." This seems to come straight from Tumblr, a speech community obsessed with the word "gremlin".

Secondly, the fact that ChatGPT seems to have registers at all. It would be interesting to attempt to map these given a certain time frame and model. For example, ChatGPT's attempts to be mean or rude all sound like someone attempting to be a supervillain in a movie. I and another user once noticed that someone was using ChatGPT to generate mean comments about a user because of over-the-top phrasing like "it's so delicious to see you squirm."

Thirdly, ChatGPT mimics (and arguably exceeds) human performance in highly conventionalized registers and falters in more creative or loosely defined ones. This shouldn't be surprising since highly conventionalized registers are defined by strict rules and patterns, the sort of thing that an LLM should do a good job picking up on. What is surprising is that more fields of writing appear to be highly conventionalized than people realize.

Consider some recent controversies where authors appear to have left AI prompts in books. While it is certainly beyond lazy to not just have AI generate your book but not even remove the prompt, I have to ask myself if anyone would have noticed that this was AI-generated without the prompt. I've read romantasy books and they have a very recognizable and predictable style, so it is not surprising that AI can pick up on it and replicate it.

Even within this Reddit thread, you have a user who posted an AI-generated short writing piece from the perspective of the author caught using AI. One user admits that it "could have fooled [them]." The writing:

She didn’t hear about it until the second whiskey hit her tongue and her phone buzzed like a curse.
“Page 311. Did you mean to leave this in?”
She froze. Eyes scanned the message. Then the attachment. Her stomach turned. Insert emotionally devastating betrayal here.
There it was. In print. Hardcover. Embossed. Preorders gone feral, hitting bestseller lists before the ink dried. Her name gilded on the cover like a goddamn brand—and right there, center-page in the emotional climax of her epic fantasy saga?
A f***ing AI prompt.
Not even a clever one. Not even hidden. Just hanging there, raw and exposed, like a knife wound left open.
She stared at the screen, the bar around her going muffled. Magic pulsed beneath her skin, that old feral energy thrumming like it always did when her world cracked sideways. Her books weren’t just stories. They were blood. Sacrifice. Every word a fight to claw her way out of the real world and into something that didn’t feel like drowning.
Now readers thought she was a fraud. That some machine had bled for her.
The publisher hadn’t called yet. But they would. The suits would want a statement. Damage control. An apology tour with fresh makeup and fake tears. Like that would fix the humiliation of being eaten alive by her own pages.
She clenched the glass in her hand until it cracked.
In her stories, queens burned kingdoms when betrayed. She wasn’t a queen. Not yet.
But something was definitely going to burn.

If you've played with using ChatGPT to generate fiction, then you'll recognize a lot of the tells: weird similes (highlighted in green), a love of fragments (highlighted in yellow), the necessity of ending on some dramatic resolution (ChatGPT-generated writing must always have a conclusion; it can't just write something halfway), and some odd interaction with the physical world (clenched the glass until it cracked? that's really hard to do). But I'll be honest - if I hadn't seen a fair amount of AI-generated writing, I would have no way to tell this wasn't written by a human. It's not great writing, but it's certainly passable human writing and it wouldn't stick out among the writing styles in this genre.

Meanwhile, the more specific the style, the harder it is to copy. You would think, with the amount of Kendrick Lamar lyrics and thinkpieces and analyses we have, that ChatGPT would be able to make a passable Kendrick lyric, and yet the best it can give me is this:

Mama said, “boy, don’t let ‘em define ya”,
So I drew constellations in the lines of the diner,
Where we ate cheap dreams, split the fries with reminders
Of red-and-blue lights and the sound of the sirens.
I was five when I saw God blink — in silence.
Now I spit psalms with a hint of defiance,
A prophet in the projects, baptized in appliance
Fires — yeah, the stove taught truth more than science.
Now my tongue’s a blade, cut shame from the bias.

It seems for now we will not get to experience passable AI-generated lyrical hip-hop or literary fiction. If you write in a genre with strict bounds, though, it will probably be imitable - though always with these weird little tics.

Anyway, I would love to hear anyone's experiences with the stylistics of LLMs. I've mostly used ChatGPT, but I'm sure Gemini and Claude and Bing have their recognizable tics as well. I'm also interested in registers I haven't discussed here.

Karen

I'm curious to know if anyone else likes to play 'games' with phonology every now and then.

One thing I like to do is to take a sentence, replace every vowel with the same vowel, and see how comprehensible it is, as well as the aesthetic result. For example, you could replace every vowel with [i], which creates a really pinched sound but doesn't really make it difficult to understand the sentence. You could replace every vowel with an r-colored schwa, which makes it sound like you're a meme from 2008 ("ermergerd").

Though using the same vowel over and over again seems like it should be easy, I find myself subtly altering the vowels based on what I know the original vowel was. For example, in 'based on', I find myself saying [bist ɪn] to mimic the fact that [a] is more open than [eɪ]. Incorporating r-coloring is also more challenging for some vowels than others, as trying to do this challenge with [æ] shows.

There is no point to this game other than to produce silly sounds and see how comprehensible the sentences you produce are. I'm impressed by how comprehensible English is when you basically destroy all vowel information. Spanish is still quite comprehensible, but it loses more and I have to focus to really make out what's being said. I'm almost certain there's some computational phonology concept out there that discusses how much information is encoded in vowels. I know for individual vowels, there's a concept called the functional load, but that's about how much that vowel serves to distinguish from other vowels.

Having tried this experiment has convinced me that writing systems that only use consonants, such as Arabic and Hebrew, are not as restricted as they appear.

If you happen to have linguist friends or just fancy a challenge yourself, you could also try more particular phonology games. For example, you could challenge yourself to shift every vowel in a sentence clockwise based on a vowel diagram to create a new vowel shift. You could also pick a vowel and move it in a direction and then try to imagine what the resulting vowel shift would be like.

Karen

Added a new left navigation bar to the layout! This has major articles from the blog over the years. I've organized them based on whether they're part of the Dialect Dissection series, a major article, or broken down by linguistics subcategory. Not every article is here, just ones I liked enough to want to put on the sidebar. (Some of the early articles, for example, are not quite good enough to want to put there without changing them up.)

My hope for this is that it will make it easier to find (dare I say binge?) linguistics articles that I've written without needing to mess around with tags or the blog archive. I have an 'all articles' link on the right, but it's out of date. I'm replacing 'all articles' with a copy of the left nav so phone users can also access the major articles there.

I may add or remove articles as I see fit. The blog archive will remain available for anyone who wants to find an article that hasn't been added to the left nav. Things like link hauls, blog updates, recommendations, etc. will probably not be added to the left nav since they tend to be 'of the moment', but I am thinking of just tagging them.

- Karen

Karen

Wanted to share this - on April 17, Ace Linguist passed 1 million views! Thank you to all the readers, past and present, who've come here!

- Karen

Karen

A reader sent me an email asking me for information on the relationship between the Japanese trilled R, yakuza speech, and working class masculinity. I found some posts which I thought might be interesting for readers of this blog, as well:

First, this Stack Exchange post:

If you're referring to the kind of allophone on /r/ that typically appears in yakuza speech, as you said yourself, it is called 巻き舌 (まきじた) in Japanese. Phonetically speaking, it's the alveolar trill.
This is not one of the allophones the average native Japanese speaker would use when speaking in standard Japanese in normal context. As the Japanese Wikipedia article on the alveolar trill says, this allophone is most typically associated with the dialects(?) called べらんめえ調 spoken by 江戸っ子 (えどっこ) and 浜言葉 typically stereotyped as fishermen's speech, and a certain aggressive, hostile, and rough register in spoken language (such as typical yakuza speech).

And this one:

As you probably already read in the question on dialects, Yakuzas are often pictured speaking Hiroshima-ben on TV. According to Japanese friends, this has probably as much to do with the fact that Hiroshima-ben naturally sounds quite hard to the ear (whereas soft-spoken Kyoto-ben is the typical dialect choice for cute, feminine characters) as any real-world trend.
In reality, I'd say your average yakuza speaks a slang-heavy, ultra-masculine version of whatever his (or his clan's) native dialect is. As it happens, many yakuzas do come from Hiroshima and the West of Japan in general, so Kansai-ish accent is usually a safe bet (once again: Kansai-ben(s) tend to sound harsher and more direct than other dialects, so there's that too). Pretty sure the none-to-rare Osaka yakusa speaks exclusively osaka-ben...
As you already guessed, among obvious (possibly stereotypical) characteristics are:
1) extreme rolling of 'R's and ample exaggerated consonant gemination...
Addendum: talked some more with (Japanese) friends about this and the consensus was that there ultimately isn't any fundamental difference between "Yakusa-speech" and heavily-masculinised working-class osaka-ben (or hiroshima-ben, or whatever you pick).

Also found this humorous website that has some audio samples of yakuza speech in film. In the one titled 'stick out your finger', the trilled r is quite audible.

In short, someone wanting to learn more about this may want to investigate hiroshima-ben and its association with masculinity, working class identity, and yakuza. If you speak any Japanese, you could also try to search for '巻き舌 (まきじた)' on the Japanese net. I tried searching through JSTOR but didn't find anything on the alveolar trill in Japanese.

Karen

Some interesting links I've found this month.

Old English Mann

"Mann and Gender in Old English Prose: A Pilot Study" - A look at the use of the word 'mann' in Old English hagiographies and whether it was applied to mean "man" (as in male human) or "person." It appears that mann was often used in the same way we would use "person" today, though it had already begun taking on the implication of "male." Contrary to my expectations, there are actual references to women as "mann," though they are not common. As such, "mann" in Old English appears to be genuinely gender neutarl.

Contrast this with something like "generic he." We may say "a student must carry his books" to refer to a generic student, but "a kindergarten teacher must care for his students" sounds odd, even though "kindergarten teacher" is not gendered like words such as "mother." This is because we know, in real life, that most kindergarten teachers are women and not men. This shows that "generic he" is not gender-neutral but "masculine as default." "mann," in contrast, is genuinely gender neutral as it can be applied to women:

Þæs dohter wæs gehaten Ercongota halifemne, & wundorlic man.
(this daughter was called Eorcengota holywoman, and wonderful man)
Anglo-Saxon Chronicle (Peterborough continuation) • Oxford, Laud Misc., year 636.

Several other female saints are described in similar terms in the Old English corpus, for example, Helen and Agatha by Ælfric: His modor wæs cristen. Elena gehaten. swiðe gelyfed mann. and ðearle eawfæst; ‘his [i.e. Constantine’s] mother was a Christian; [she was] called Helen, a very devout individual and firmly pious;’ and Hwæt ða Agathes inwerdlice clypode mid astrehtum handum to þam Hælende þus, Eala ðu min drihten, þe me to menn gesceope; ‘Well then Agatha called inwardly thus to the Saviour with her hands stretched out: Oh you my Lord, who made me a human being’. Of the holy woman Romula it is said that seo wæs wundorlicre geþylde man, ‘she was a person of amazing patience’. Mary of Egypt says of herself ic eom wifhades mann, ‘I am a female human being’.

The word 'mann' emphasized the humanity of a person, whereas the words 'wif' and 'wer' emphasized their sex. The paper elaborates on the consequences of this as applied to descriptors of Jesus. Referring to Jesus as 'mann' foregrounds his humanity. In contrast, Jesus is not referred to as 'wer', presumably because it is not relevant to foreground his male-ness. Angels are explicitly not called 'mann' as they are not human.

Numerous references to Christ as mann can easily be located in the corpus of Old English prose, and are clearly linked to the concept of his humanity. His identity as the Son of Man appears as mannes sunu in Old English (ca. 153 occurrences), in a clear example of gender-neutral mann, here understood to refer to the Virgin Mary and her human ancestry; the hypothetical weres sunu by contrast seems to be unattested in connection with Christ, which is unsurprising since Christ cannot possibly be the son of a gender-specific wer, a human male.

There are already examples of 'mann' acquiring the masculine connotation it would have in the future.

he genam hi þa onsundron and sæde hyre gewislice hwæt heo man ne wæs and hwylcere mægþe
He took her to one side and told her firmly that she was no male and (asked her) what kind she was

I recommend reading the paper for more; it's very insightful.

Origins of Japanese Vowel Length

The Japanese long vowels ei and ou. I have been studying kanji recently, and noticed something curious. The 'on' (Chinese) readings often had long vowels, such as 'shuu', 'shou', 'tei', while the 'lun' readings rarely had those long vowels. I wondered if these long vowels were a result of the borrowing (and thus later introduced into Japanese). This blog post has the following to say about the origins of the long vowels 'ou' and 'ei':

[...] if you take a closer look at -ei, you will find that these pronunciations have Chinese origins, writes Shota Kumono:

First group: 英 (えい), 精 (せい), 兵 (へい), 霊 (れい), etc.
Second group: 系 (けい), 制 (さい), 低 (てい), 例 (れい), etc.

The first group came from the Chinese -ng coda. As Japanese does not have an -ng coda, and has not developed the moraic nasal ん back when these characters were borrowed (the time of Go-on and Kan-on), the -ng coda of Chinese became -i or -u depending on the preceding vowel sound: it becomes i after /e/ and u after /a/, /u/* and /o/. [...] Here are the TCC (Chinese) pronunciations of the above kanji:

英 yēng, 精 dsiēng, 兵 piēng, 霊 liēng
系 ghêi, 制 cêi, 低 tê̄i, 例 lêi

As you can see, the -ng in such cases and the -i from Chinese both got borrowed into Sino-Japanese as -i.

I wonder if this is influenced by pre-velar raising. Meanwhile, when it comes to 'ou', the case is a little different. It is caused partially by vowel deletion and partially by borrowing from Chinese:

About the case with “ou”, it’s a bit different. Interpreting what Shota Kumono has written, /o:/ did not originally exist in native Japanese. This makes sense, as Japanese syllables originally consisted of a consonant-vowel structure. In modern Japanese, vowels aren’t necessarily preceded by a consonant.

The modern /o:/ came from 3 sources:
1. Historical お row + ほ (modern お row + お)
2. Historical お row + を (modern お row + お)
3. Chinese -ung (通 rime; TCC -ung or -ong), -âu, -au, -əu, etc.

Interesting! If I'm reading the blog post correctly, this means that words written with 'oo' (おおい - ooi, many) and 'ee' (ねえ - nee, sister) are native Japanese words. This certainly explains the preponderance of long vowels in the 'on' readings for Kanji.

Karen

I recently encountered the following party game: Mumbo Jumbo, based on trying to pronounce a word while wearing a mouth-stretcher. Basically, you wear a mouth-stretcher, and then you are asked to read out a phrase. Your teammate has to figure out the phrase you're trying to say.

The mouth-stretcher means that you cannot make labials. Of course, the words are selected to take advantage of this limitation as much as possible, asking you to make phrases like 'the moose makes molasses.'

But if you know about phonetics, you should be able to figure out a way to get around this limitation. For example, you could use a substitution of [n] for [m] since both are nasals: 'the noose nakes nolasses.' 'nakes' and 'nolasses' aren't words so hopefully your interlocutor will take advantage of error correction to figure out that the words are 'makes' and 'molasses.' 'noose' is a real word, so they may need to try twice to get the phrase correctly.

Similarly, what about something like 'feel the breeze'? [f] and [b] are not available to you. You know [f] is a fricative, so maybe you think to go with 'seel' as a substitution. But 'seal' is a real world and that might confuse your interlocutor. You could try a voiceless 'th' instead, which is more similar to 'f' anyway (as f-substitution shows). This produces the nonsense word 'theel', which hopefully your interlocutor will correct to 'feel'.

How about for communicating 'breeze'? You have a few voiced stops you can use like [g] and [d]. If the plastic mouth extender isn't totally rigid, you could also take advantage of the McGurk effect to move your mouth down as if making a [b] and then producing a [g], hoping that your interlocutor will take the hint that the labial movement matters.

While this game is based on using a plastic prop to force your mouth into not producing labials, I feel like you could extend the concept for linguists as a party game. How about a game where you cannot use stops? Or no alveolar consonants?

I've written about another game where knowledge of phonetics can give you an edge - karuta, a Japanese card game.

Ace Linguist

Search Ace Linguist

July 15, 2025

The Stylistics of ChatGPT

June 18, 2025

Phonology Games

May 8, 2025

Layout Update

April 28, 2025

1 million views!

April 17, 2025

Link Haul - Japanese Trilled R and the Yakuza

February 28, 2025

Link Haul - Old English Mann and Japanese Long Vowels

Old English Mann

Origins of Japanese Vowel Length

January 10, 2025

Party Games for Linguists

Popular Posts