Hearing our book changed how we wrote
Jan and I didn't set out to create an audiobook. Yet, that's what we've published.
Jan and I didn't set out to create an audiobook. Yet, that's what we've published. We turned The Spectral Agent into an audiobook using AI voices and our combined creative and technical skills. What started as a way to hear our drafts aloud while walking evolved into a full production pipeline, with original music, AI-generated, cloned voice narration, and audio mastering.
While we hope to hire a voice actor in the future, the iterative use of AI voices allowed us to fine-tune the performance and quality. We’ve used transcription tools and voice generation to help refine the story. Hearing the story out loud helped us spot issues, refine characters, and keep the work moving forward. The audiobook has grown naturally out of our process.
Creative and technical collaboration
Jan and I work best when we combine our strengths. Jan writes, draws, and composes music. I take on editing, managing tools, and now, learning to mix and master audio. He's much more creative, while I like technical details.
The Spectral Agent is Jan's brainchild. Viktor Levitsky and his cohorts have been percolating in Jan's mind for years. I'm the editor, or as I often put it, Jan's technical advisor.
Together, we’ve both been developing skills that didn’t seem related to storytelling at first. Yet, they've ended up feeding into the audiobook process. For example, Jan has been learning to make better music, and I’ve been learning how to mix it.
As we each improved in those areas, we began applying those lessons to how we work on The Spectral Agent. Everything we do follows the same pattern: try something, improve it, iterate. That rhythm moves us forward.
From manuscript to audiobook
The story itself had been forming for years, but Jan finally began writing it in early 2025. He had the complete manuscript written within a couple of months. Once it was finished, we started editing.
We avoided editing while writing. It's important to have a first draft of an end-to-end product before getting bogged down in the details. However, we did refine the story as Jan wrote it. Listening to the story read aloud became a critical part of the story-writing and editing process.
We used AI-generated narration to bring the story to life, but the real advantage was the feedback loop. Write a scene, hear it out loud, then go back and refine it. That approach shaped not just the eventual audiobook but the story itself.
While the manuscript was written in a couple of months, it takes more time to complete all the edits. We're actually still evolving it as we publish each chapter.
Walking is when we're most creative
A big part of our creative process is going on walks together to talk through story ideas, characters, and plot points. Walking is great for getting creative juices flowing.
The setting of our long walks has shaped the story, too. Jan wrote the early chapters during winter, when we were walking in the cold. Jan prefers the cold—it’s the only time we can really walk and think for hours.
So naturally, the book is set in the cold months. When Victor tightens his jacket in the story, it’s because we were doing the same on those walks. Now that it’s summer, walking is less of an option, so we’re focusing on editing and publishing.
How walking led to the audiobook
However, it's challenging to recall everything we discuss while walking. We talk about so many things, I barely remember what we started the walk talking about by the end.
Jan has had complex story ideas for years. So many, it's hard to keep track of. I've used Obsidian to track my own thoughts for a few years. I’m not great at remembering things, so I use it as my second brain. I had Jan start using it as well. Over the past few years, he’s built a massive vault—hundreds of thousands of words—cataloging characters, timelines, and story ideas.
For at least a year or two, we typed notes into Obsidian on a phone as we walked. But it was awkward, so we'd jot down just a few notes. But that meant a lot of nuance was lost. Jan would often find himself trying to remember or recreate those ideas later when finally sitting down at a computer.
Last year, while sitting in a coffee shop discussing a story, we started using AI to transcribe our conversations. That was a turning point. With it, we could talk naturally and still capture all the nuanced points. With everything transcribed, Jan had the finer details of our discussions to draw inspiration from when sitting at a keyboard.
When we didn’t have that, we’d sometimes reach a later chapter and wonder, “Wait, why did this happen?” We'd realized Jan had forgotten to include some detail we’d discussed initially on a walk. The iterative process helped us go back and fix those continuity issues, but the transcription made it a lot easier to keep the story cohesive from the start.
Turning drives into draft reviews
The transcribed text was extremely useful. However, what really shaped our process was hearing AI read back our conversations and story to us. At first, we used ChatGPT to transcribe ideas. Then Jan would write a chapter and read it aloud himself while we walked. At some point, we started pasting the chapter into ChatGPT so it could read it aloud.
That kind of listening felt familiar. We’ve listened to audiobooks together for years while traveling. So, hearing Jan's story that way was just a natural extension of how we already enjoy stories. It gave us a sense of the pacing, tone, and energy of the scenes from a listener’s perspective.
Another place where transcription and audio generation were helpful was in the car. While driving us around town, we'd discuss story ideas and listen to Jan's latest draft. Before AI transcription, I'd often transcribe for Jan because I type much faster. Since I was driving, I couldn't type or read, and we didn't always have a computer with us.
Giving the characters a voice
We began to rely on the generated audio to hear the story come alive. But ChatGPT isn't really intended to be a voice generation service. It didn't really handle being given a couple of thousand words to read aloud well. It often refused to do it. So we looked for other options to generate the audio.
We started using ElevenLabs to generate audio. It specifically has a feature for audiobooks. We could paste a completed chapter in it and hear it read aloud without any of the frustration of trying to get ChatGPT to read it.
At first, we used the Viktor voice to read early drafts. At some point, it became the voice of Viktor in our minds. We couldn’t think about the character without hearing that voice. Though, it wasn’t until much later that we decided to publish as an audiobook.
You can see what Viktor sounds like below. We think it’s quite fitting. What do you think?
Listening to a book rather than reading it changes how you think about writing. A good audio performance requires books to be written differently. Traditionally, a book might have a lot of "he/she/they said" after each dialogue. You may not notice these details while reading, but they become annoying when you hear them.
So you have to write in a way that's obvious which character is speaking, without constantly mentioning who's speaking or their pronouns. Pronouns alone get confusing anyway when you have multiple characters of the same gender talking or have non-binary characters.
We decided to use different voices for each character because the characters are alive in Jan's mind. It helps him process each character more effectively by hearing them. Jan draws every character, so he knows what they look like. Hearing their unique voice brings them fully to life.
Even though you can tell who's talking in the audiobook just by the voice, we didn't want you to have to guess who was speaking, even if it was narrated with only a single voice. We tried to make it clear who was speaking, without explicitly mentioning it, unless it was to add nuance. Hopefully, that comes through in the writing.
The iterative process of using AI-generated narration helped us make the book better. It's clearer, more concise, and less annoying when read aloud. Whereas, if we had hired a voice actor, we would have had many back-and-forths. It would take days to wait for the voice actor to read it to get feedback. With AI voices, we can hear what the book sounds like within a few minutes.
While we're at this stage of evolving the book as we edit and publish it, we need fast feedback. If it gains enough popularity and we secure the necessary funds, I would love to hire a voice actor to perform the book. It would actually be cool to have the original voice actors who created the AI voices read it.
By the time the voice actor receives it, they will have a refined book that works well for audio performances. We didn't have to go through the slow, iterative process, which would have taken up too much of a voice actor's time.
Tweaking voices
Even with ElevenLabs narrating, there was still a lot of work needed to make it production-ready.
The AI voices aren't ready to go right out of the box. There are many times when they would say strange things in unusual ways. For example, here's the first (hilarious take) of Viktor shouting:
In cases like this, I'd have to regenerate the sentences to make them sound right. I'd also tweak the settings for each virtual voice actor. ElevenLabs has a few knobs to adjust such as speed, similarity to the original voice actor, and a few others.
For example, the Viktor voice is provided by a man speaking in a Russian accent. Viktor was born in Russia, but has lived in the United States since early childhood. So, it makes sense for him to have a slight accent.
The way the voice actor reads doesn't always match the pacing of English writing. Sometimes the Viktor voice will speed up or slow down at odd points. Sometimes it works well in the story, and sometimes it doesn't. We tried to adjust it, but it's not always perfect.
Sometimes we ended up changing voices just because they were wrong. We couldn't consistently achieve a good performance with the AI voice. We'd have to find one that was similar without changing too much of the character we already created.
All of the voices have different volume levels. ElevenLabs has a normalization feature on export, but it sounds too compressed and overdriven. Additionally, it's not normalized when previewing each chapter or sentence in the app, which is a crucial part of our workflow.
I wrote a script to make each voice say "The quick brown fox jumped over the lazy dog". Then, it gets the peak volume of each voice and normalizes it to the loudest voice. This way, no voice was too loud or quiet. I have another script that will update the settings in ElevenLabs. This way, when we preview, the voices are equally loud, and it simplifies later parts of the process.
Audio production
Even with all the tweaking of voices inside the ElevenLabs app, the raw AI output wasn’t clean enough for a polished release.
That’s where the rest of our workflow came in. I’d export the voice tracks as WAV files and drop them into Logic Pro for mastering. From there, I’d apply EQ, compress dynamic range, remove unwanted noise, and control breath sounds. For example, the Viktor voice is boomy in low frequencies and tends to take a sharp breath at the end of sentences.
We also have to do some mixing. For example, the first chapter features a music intro created by Jan. We need to mix it so that you can hear both the audiobook performance and the music performance.
It took multiple rounds of listening across the platforms where we typically listen to audiobooks. We'd listen in the car, on the phone speaker, on AirPods, and on our studio headphones. We wanted to make sure it sounded good on all of them. I'd then listen on each app (SubStack, Spotify, Apple Podcast) to ensure that each platform didn't process the audio drastically differently and affect the performance.
For the four chapters that are currently published, I re-uploaded multiple times as I tweaked the audio. We want to lock down the audio process early because each time we change a voice or the audio levels, we have to regenerate and re-upload all the chapters before it. Otherwise, if someone is listening to the book all the way through, they'd hear the performance changing throughout the book.
I didn’t start with a background in audio. But Jan’s been producing music, and I’ve been learning how to mix it. Those skills came together here. They weren’t part of the plan, but they became essential to finishing the audiobook. That’s the core of our process: try, learn, apply—then do it all again.
What's next?
We’ve published the first few chapters and have around thirty total written. Every chapter is a chance to improve our process—writing, editing, voicing, mixing. And everything we’ve learned here will carry into the next book. Jan already has outlines and characters ready for future stories, many of them connected to this one. We’re not just building a book—we’re building a universe of stories, layer by layer.
If there’s one lesson we want to share, it’s this: don’t wait for things to be perfect. Write a complete first draft. Finish something, even if it’s rough. Then go back and improve it. You’ll learn more from fixing a finished piece than from polishing a single chapter endlessly.
That’s what we’ve done—and are still doing—with The Spectral Agent. This version isn’t perfect. But it’s the worst it’ll ever be, because we’ll keep making it better. The only way to get feedback is to put it out there. So, we’d love to hear your thoughts.
And if you’re working on your own project, we hope this encourages you to do the same—get your work out there, take feedback, and keep building.
Your first draft doesn’t have to be flawless. It just has to exist. The rest comes through iteration.
If you want to try out making your own audiobook, check out ElevenLabs. This affiliate supports us in making more content.
To see how this all turned out, check out The Spectral Agent Podcast.
Have you tried the voice changer feature on elevenlabs? Where you read it with your own pacing and it converts it to your chosen voice
Hearing how Viktor’s voice became “the voice” in your minds is so cool. Funny how AI tools can become just as emotionally tied to our stories as real people.