top of page

5 Best practices for podcasting with synthetic voices

Obviously. there is no better way to create a podcast than to record it, edit it and master it all by hand. Given the fact that this immense effort is unjustified if you compare the revenues by podcasting, you see the need to use alternatives, such as synthetic voices, right?

If you want to do it right, here are five best practices that we discovered over the past 12 month, that should help you to bring your podcast on air.

Hear the article below, or scroll to read it yourself ;)


Shorter text always sounds better than longer text

Through over 1000 podcasts we've produced alone this year, we've found that short texts are far better suited to be converted into podcast by synthetic voices than longer texts.

Podcasters and news-Editors got feedback from their listeners that news podcasts with a length of 2-4 minutes are the ideal length. In comparison, listeners of audio longer than 5 minutes will start to disengage and will mist likely not remember what they listened to at the beginning

Short bursts of content however, in conjunction with breaks, sound effects and visuals will work great.

Being said that, a well written article is not automatically engaging in audio format. Use summaries to get your point across instead of just rendering a long text.

Use alternating multiple speakers and speech speed to increase the dynamic range of the spoken content

For longer formats, such as podcasts, break them into sections and have them spoken by 2 voices instead of one. The alternating voice models have different prosody patterns, which helps the human mind to focus.

Experiment with the speed of the voice – depending on your format an tonality, a faster voice gives your content the extra touch

Hygiene: Add more commas, breaks and periods than you would using written text

Synthetic speech tends to plow through text without emphasizing specific words in a nuanced way. This is what makes it sound robotic. These parts are easy to spot when you listen to the rendered speech.

To create a more nuanced prosody:

Use commas to emphasize the word before

Use periods to break monotonous speech

Build shorter sentences instead of long ones with lots of commas

Use breaks with different lengths in between sentences or sections to imitate natural speech

Use alternative spelling for words that are being pronounced incorrectly

Text-to-speech models, no matter which provider, are not perfect yet. You will encounter bugs with specific words that are not pronounced correctly. Foreign words tend to be pronounced in the voice’s language. If you spot this, it’s best to try an alternative spelling to force the voice to spell the word correctly:

“controller” is being pronounced “comtroller” in some voices

Alternative spelling: “conn-troller” fixes this bug

The French term of “Coup de Grace” can be spelled

“Koo-de-grahs” to force correct pronunciation

Use sound to enrich the experience

Sound makes a huge difference in audio. Don’t stop after creating speech. Only by adding sound effects, music and sonic branding can you really bring text-to-speech alive.

Use an elastic sound template or have one built for your brand to really take advantage of this. By splitting your audio into sections the sound design will adapt to the point you are trying to make.

Add sound effects for emphasis, such as bumpers or risers.



Keen learning more?
Get in touch!



bottom of page