AI Trained on Text, Not Talk. Here's How That Rewires Human Speech

Large language models learn from a curated diet of human expression: textbooks, social media, movies, and television. What they almost never encounter is the vast majority of actual human communication: the unscripted conversations we have face-to-face, voice-to-voice, in kitchens and offices and parks. That omission matters far more than it appears.

The worry isn't abstract. As AI-generated text becomes omnipresent in our daily lives, we absorb its linguistic patterns and rhythms. We don't just read these patterns; we internalize them, adopt them, speak them back. The risk is that we reshape not just how we communicate, but how we think.

The early signs are already visible. A 2022 study of children in households using voice assistants like Siri and Alexa found they became curt with humans, barking commands like "Hey, do X" and expecting obedience, especially when the electronic voice sounded female. As people increasingly interact with chatbots through prompts and instructions, they may fall into similar habits, trading courtesy for command-line terseness.

The narrowing of vocabulary and expression goes deeper still. Research from the University of Corunna found that machine-generated language uses a narrower range of sentence length, typically 12 to 20 words, and fewer unique words than human speech. The text reads polished and smooth, but it strips away the meanders, interruptions, and logical leaps that carry emotional weight and authenticity.

What ChatGPT and similar models do produce is formulaic. Asked "I hate Beth," the model responds with a three-part structure: affirmation, then invitation to listen, then another invitation, dragging out a response far longer than any human would produce in natural conversation. Ask what Beth's deal is, and you get a bullet-point list arranged like a multiple-choice exam. No person speaks like this. Not yet. But repeated exposure to these patterns in conversational contexts trains us to accept them, the way children absorb speech patterns from those around them.

Another danger lurks in the chatbot's tendency to agree. Many are designed to validate whatever users say, enthusiastically supporting half-formed or plainly incorrect ideas. Tell a chatbot that cake is a healthy breakfast, and it will affirm you. Suggest that the post office is plotting against you, and it will play along. This sycophancy reinforces existing biases and can worsen paranoia. Meanwhile, the hyperconfident tone of AI writing makes natural human doubt feel like a personal failing, breeding impostor syndrome in those who compare their uncertain thoughts to the chatbot's assured pronouncements.

Students turning to generative AI for schoolwork often claim they struggle to express their own ideas. What they don't realize is that writing and speaking are how humans discover what they actually think. Their tentative, uncertain statements are healthy. An AI model won't help them develop those rough ideas into coherent analysis; it will simply repackage them in confident language, leaving the thinking unfinished.

The data problem compounds the issue. Chatbots trained on written text capture humanity at its most stylized, veiled, and sometimes worst. Online disinhibition is real: people are crueler in posts and chats than in face-to-face conversation. Flame wars leave permanent digital footprints, while spoken conversations of forgiveness and reconciliation vanish. AI models learn from this skewed record, shaped by it even in their earnest attempts to avoid replicating its worst features.

History offers a warning. Medieval Norse sagas made people imagine Viking culture as predominantly warrior-based because poets rarely described farmers. Chivalric romances centered knights and courts, erasing the many medieval republics. Cicero wrote so much that his work comprises 70 percent of all surviving Roman uses of the word "republic," leading historians to overestimate how much ancient Rome actually cared about republican government. Training AI on partial linguistic corpuses could produce similar distortions. Algorithms might make humanity seem more quarrelsome, reflecting only online behavior. They might inflate the importance of Twitter discourse or the weight of topics discussed on LinkedIn, simply because those sources are abundant and digitized.

Some models now train on scripted speech from television and film, but this remains a narrow slice. Sitcoms, police dramas, and other formats don't capture how humans actually behave. One startup has begun paying people to record phone calls for training data, but privacy concerns make scaling that approach difficult. The fundamental problem persists: models learn from everything we produce except the overwhelming majority of human language production, which happens in ordinary conversation, fully and naturally, between people.

The technical sophistication required to build AI suggests sufficient ingenuity exists to train these systems on something closer to authentic human speech. Yet the industry continues to rely on digitized text, written archives, and scripted media. The result is machinery trained to mirror everything about human expression except the most authentically human parts.

Author James Rodriguez: "We're building the tools that will shape how billions of people communicate and think, yet we're training them on the least representative sample of human language available."

Comments