Search Ace Linguist

July 1, 2026

Code-Switching for Robots - Language Mixing

I promise I'll write about things besides Japanese and LLMs. But first, a brief word on "language mixing."

I have occasionally encountered situations where LLMs suddenly and without warning begin mixing English and non-English words. Usually, the non-English language also has a non-English script. I've encountered Arabic, Korean, Japanese, and Chinese. The Japanese may be understandable in contexts where I've previously included Japanese text in the conversation for whatever reason, but the Arabic, Korean, and Chinese are quite head-scratching.

Chinese inserted. Image mine. 

Korean inserted. From source.

I've jokingly referred to this as "code-switching," the term for swapping between languages or dialects within a single discourse context. I say "jokingly" because "code-switching" is used by humans with each particular language selected for a reason. The term for it in academia is apparently "language mixing." Unlike human code-switching, language mixing is often "unintentional" in that it is an unwanted artifact. LLMs can produce text that swaps between two languages in a naturalistic way, which we could perhaps refer to as "code-switching," but "language mixing" is broader.

 


Apparently language mixing can actually help the models reason better. Suppressing bilingual chain-of-thought can degrade performance, which makes me think of how people who code-switch are actually people who are strong in both languages. (Of course, that is not to suggest we can generalize about LLM performance from human performance.)

At the same time, because the models are trained so aggressively on English, the chain-of-thought itself may be in English, and trying to force it to reason in a language other than English may also degrade performance, even if the output is meant to be in another language.

Users tend to feel unsettled when they see language mixing. Language mixing breaks the illusion that the LLM you are speaking to is a human-like companion because it makes a speech decision that no human ever would. Because it is so unexpected, it gives the impression that something has gone wrong with the LLM. Most language mixing today is minor slippage of a single word. However, there have been catastrophic instances of language mixing in the past.



No comments:

Post a Comment