Skip to main content

AI Voiceovers

Christopher John avatar
Written by Christopher John
Updated over 3 weeks ago

Introduction

The AI Voiceover feature allows you to generate lifelike voiceovers directly within the creative editor using Google Gemini 2.5 TTS and Chirp 3 HD Voices. It allows you to enhance multimedia content with customised narration, dialogue and sound design -all within the editor and without the need for external tools.


At its core AI Voiceover is utilising two cutting-edge voice synthesis models:

  • Google Gemini 2.5 TTS: Prompt-based voice styling using plain English instructions.

  • Chirp 3 HD Voices: Best in class realism, ideal for emotionally rich and/or narratively complex content.


What Can You Do?

  • Create Your Script

    • You can manually input your narration/script, generate it with AI or upload a .txt or .docx file - all within the editor.

  • Style the Voice with Prompts (Gemini only): Just type, as you would a prompt, how you want the voice to sound - e.g. 'Read this like a dramatic film trailer'.

  • Localise with 40+ Languages: Create content for global audiences with regional voices and native accents. No dubbing required - just select your language and go.

  • Control Speed, Volume & Pauses: If you need more energy, a slower, more meditative pace, you have control.

    • Adjust the speed from 0.25x to 2.0x

    • Insert [pause], [pause short], and/or [pause long] to fine tune the rhythmic flow (Chirp 3 only)

    • Boost volume to sit perfectly over background music

  • Drop it Into Your Timeline Automatically: The generated voiceover is instantly added to your project timeline. There is no need to import or drag and drop.


Top Use Cases

  • Social Media Ads: Fast and persuasive voiceovers with controlled, stylistic flair.

  • Product Explainers: Clear, well-paced and multilingual narration for onboarding or educational content.

  • Storytelling & Dialogue Scenes: The ability to use multiple voices and descriptive tones to bring stories to life.

  • Localisation at Scale: Easily create content for global audiences.

  • Tutorials & Walkthroughs: Match calmly paced, clear voiceovers to match your content and guide your viewers.


How Can I Get Started?

Voiceovers can be created and added in the Creative Editor. You can either open an existing Creative or create a new one and drag an asset in. In this example I am using a 1080x1080px template and have dragged in a short 2.7s clip of a tennis shot:

On the left hand side I have expanded the voiceover panel and can scroll down to adjust the various parameters:

  • Script (enter manually or use the buttons to either upload a .txt file, .docx file or open the AI Prompt box to prompt for a script)

  • Voice-over Styling Instructions

  • AI Model Provider

  • Language

  • Voice

  • Speed

  • Volume Gain

Here I start with a simple prompt describing the shot, add some stylistic guidance and choose a voice. The model chosen limits other options.

Once the voiceover has been generated it will appear in the timeline below. You can generate multiple voiceovers and use the volume slider to mute those that you don't want to hear to do some comparison testing. Here I have added a second Voiceover using the Chirp 3 Model.

Tips and Q&A

  1. What are the character limits for the voiceover?

    For Gemini 2.5 Pro TTS you can enter maximum of 2000 characters at once and for Chirp 3 you can enter maximum of 4000 characters at once.

  2. How can I control the tone or delivery style of the voiceover?

    Use the Voiceover Styling Instructions field. Be aware that styling instructions work only for Gemini 2.5 Pro TTS. Type natural prompts like: “Read this in dramatic whisper” or “Energetic like a YouTube ad”

  3. Which model supports styling instructions?

    The Gemini 2.5 TTS model currently supports natural language-based styling.

  4. Can I add pauses in my script?

    Yes you can add:

    1. [pause]

    2. [pause short]

    3. [pause long]

  5. Can I control speed and volume of the voiceover?

    Yes,

    1. Use Speed Selector to adjust playback rate (0.25x to 2x)

    2. Use Volume Gain Slider to increase or reduce loudness (e.g., +15.1 dB)

      Note: These are explicitly supported in Chirp 3 HD, but Gemini might respond to such styling via prompt instructions.

  6. How many languages and voices are available?

    Chirp 3 HD: 30+ HD voices across 30+ languages

    Gemini 2.5 TTS: 30+ voices across 24+ auto-detectable languages

  7. Can I switch accents or select specific voices?

    Yes, use the Language Selector and Voice Selector to customize your output voice and accent.

  8. Can I re-generate a voiceover?

    Yes, you can always revise your script and/or settings and re-generate a new voiceover.

  9. Can I use styling instructions with Chirp 3 HD?

    No, Chirp 3 HD only supports manual markup and pause control. Styling via natural language is not supported.

  10. Does Gemini 2.5 TTS support volume and speed adjustments directly?

    Not through dedicated sliders, but you can indirectly influence them using natural styling prompts (e.g., “Speak slowly with high energy”).

  11. Can I mix different voices or languages in one voiceover?

    No you cannot mix different voices in one voiceover.

  12. What is the cost of generating an AI Voiceover?

    With either AI model, the cost of generating one voiceover is one generation.

Did this answer your question?