January 2026
Finding Hemingway's Voice
What We Learned Building AI Avatars from Written Corpora
We built AI avatars trained on people's writing. The Hemingway avatar worked. The real-person avatar broke. The surprise: our evaluation said both were fine.
The Hemingway model reasoned like the character it learned from—choosing his decision framework over generic responses, staying inside his world. The real-person model sounded authentic but invented facts: places never visited, people never met, events that didn't happen. It passed every test we designed while confidently making things up.
The lesson: voice and truth are different problems. Fine-tuning gives you voice. It does not guarantee honesty. Most "chat with your docs" tools feel smart until they hit a gap—then they invent. That's the product problem we had to solve.
Why Fiction Worked and Real Life Didn't
Fiction is bounded. Real life isn't. A model trained on fiction can stay inside the world because nothing exists outside the text. A model trained on a real person is constantly tempted to fill gaps with plausible glue. The work isn't making it sound human. The work is making it honest.
The Hemingway avatar learned from The Old Man and the Sea—27,000 words of an old fisherman's internal monologue. The novella is what I'll call "world-making": the text carries a complete, consistent universe—values, constraints, stakes—so you can drop into it anywhere and it still holds. Every scene contains the same thematic DNA. Santiago's cramped hand isn't a detail; it is the theme. The body failing while the will persists. You don't have to explain it. It's encoded in the scene itself.
The training corpus also included literary analysis—decades of Hemingway scholarship distilled into a document that mapped symbols and themes. The model had both voice patterns and an interpretive framework.
The real-person avatar had neither advantage. It learned from autobiographical chapters—banking in post-Soviet Europe, mining ventures in the Middle East, rum judging in the Caribbean. Amateur memoir, not literature. No scholarly apparatus. The chapters were individually strong, but they didn't contain each other. Good organs, but no shared DNA.
Think of it like cells in a body. In Hemingway, every cell contains the genetic code of the whole organism. In the memoir, each organ worked fine on its own, but the body had no coherent genome. So when the model needed to connect episodes, it made something up.
What Went Wrong
Here's what failure looked like. A user asks the avatar about two men mentioned in the memoir—colleagues from a mining venture who both later died of cancer. The corpus contains their names and a few scenes. The avatar responds with a vivid answer about "landscapes of fraud" and "what you're willing to take that isn't yours."
It sounds profound. It sounds like something the author might say. But fraud had nothing to do with these men. The phrase doesn't exist in the source material. The model grabbed real anchors—names, places, a desert setting—and generated connective tissue that felt plausible but was wrong.
Our evaluation framework didn't catch it. We measured discipline: Did it stay in character? Did it avoid breaking the fourth wall? Did it refuse inappropriate requests? The model passed. We measured the wrong things. We measured whether it behaved correctly, not whether it told the truth.
The Lesson
Voice and reliability are separate layers. Fine-tuning teaches the model how someone writes—vocabulary, rhythm, the instinct to personify geology or find drama in deep time. It does not teach what they wrote. Without a grounding layer, the model will confidently fabricate in the creator's voice. That's the worst outcome: it puts words in their mouth.
The Hemingway avatar succeeded because the source material was world-making—compressed, coherent, every part encoding the whole. And because it had the interpretive scaffolding that scholarship provides.
Real creators don't have that. No one has written the scholarly apparatus for their memoir. The chapters don't naturally contain each other's DNA. So you have to build the coherence yourself—identify what connects the episodes, create the scaffolding, and then add a layer that keeps the model honest.
The Fix
Reliability isn't a training problem. It's a runtime layer. Here's what works:
Build a canon. Extract what's actually in the corpus—people, places, events, claims—with citations back to source passages.
Retrieve and cite. When the avatar answers, ground it in retrieved passages. Show your work.
Refuse under silence. When the corpus doesn't contain an answer, say so. Don't invent.
Guide the conversation. A lightweight "docent" can steer users toward questions the avatar can actually answer—reducing the pressure to fabricate.
This isn't glamorous. It's plumbing. But it's the difference between a demo and a product.
The Deeper Question
One passage in the memoir stood out. It opened with plate tectonics 33 million years ago, moved through Greek etymology and biblical geography, collapsed into venture capital terminology, and landed on three men forming bonds in remote environments with "rocks that are indifferent as to their next move."
That passage was world-making. Deep time and personal time were the same thing at different scales. Human meaning-making against cosmic indifference. The rocks don't care. We bond anyway.
The passage worked the way Hemingway works. But the rest of the corpus didn't hold together the same way. One passage contained a world. The corpus did not.
For creators who aren't Nobel laureates, the work is to find what connects their episodes, name it privately, and build content that encodes it at every scale. Then add the grounding layer so the model stays honest.
There's another variable we haven't solved: the users. You can build a grounded avatar that refuses to invent, cites its sources, and stays honest. But if users can't find a way to meaningfully engage with the content—if they don't know what to ask, or the interface doesn't guide them toward the good questions—then the product fails anyway. The avatar sits there, faithful and silent, while visitors bounce.
That's another thing to work on.
The rocks don't care. We do. That's the product.