AI and Recording Audiobooks

I found one audiobooks series that I enjoy, almost more than reading it. The reason for the enjoyment is the way the actor creates the characters. Each character has its own unique voice. Even from book to book, the protagonist has the same voice so there’s consistency in the series. During this particular story, one of the characters gets a broken nose. The actor adjusted the voice for this character to simulate a stuffed nose and pain.

I’ve read that one of the many uses for Artificial Intelligence (AI), could be to create audiobooks. Once recorded (i.e., the author or actor’s voice), the AI would be able to create new audiobooks using their likeness. Likely this would take a fraction of the cost and time. A real human would only be able to read for so many hours a day. Certain small sounds we make, such as throat clearing or sniffing, would require editing from the recording. AI would likely be able to create a new audio recording instantly without those human noises. However, would AI interpret the meaning properly to match voices to action?

I’m not an actor, but actors are skilled at creating new characters, sometimes simulating different accents, inflections, moods, and emotions all with their voices. If the AI was creating an artistic reading, how would it be able to create new characters? Especially if the voice and character was new. it was one that the actor being imitated, had never used before. Would the AI pick up on changes, like a broken nose, to adjust the voice to match the action?

This would require a subtle understanding and nuance that I’m not sure AI is capable of doing. In my opinion, this is a good thing. Part of what makes the arts exciting is the way they constantly change and evolve. I enjoy this actor’s ability to create characters with his voice and express the emotions of the action. The voices and characters are different in every book of the series, with the exception of the main character, who remains consistent.

Even if the audiobook was something without characters, i.e., an author reading non-fiction, there would still be inflection, emphasis, cadences and the general rhythm of one’s voice. I’m sure AI could do a reasonably good job, likely good enough that most people wouldn’t notice the difference, but we would lose something intangible.

