What Video Generation Models does Pencil Support?

Introduction

Pencil offers access to a growing set of advanced Video Generation models that help you bring ideas to life through motion. These tools are well suited for creating short-form video content, animatics, and visual sequences - especially when combined with reference frames, style inputs, or prompts grounded in your Brand Library.

Some of the use cases include:

Generating video from a text prompt to visualise campaign ideas
Creating short-form video assets for social or ad formats
Sequencing still images into motion-based storyboards or animatics
Experimenting with motion-first storytelling for conceptual work
Producing short, repeatable motion clips for use in branded content or concept tests

What Models Does Pencil Support?

For Video Generation Pencil supports the following models:

Google Gemini Omni Flash
Google Veo 3.1
Google Veo 3
Google Veo 2
Adobe Firefly Video
Kling AI VIDEO 2.5 Turbo
Kling AI VIDEO 2.1
Kling AI VIDEO 3
Kling AI VIDEO 3 OMNI
Runway Gen-4 Turbo
Runway Gen-3 Alpha Turbo
Sora 2
Sora 2 Pro

Here we've highlighted the newest additions as well as those that get used the most by brand teams and Pencil internal creatives.

The Kling 3, and Veo3.1 model families are currently by far the best for most common marketing and advertising use cases which creatives using the platform are tasked with. These models are natively multimodal, and offer superior motion and coherence, as well as prompt adherence and fidelity to input references.

Kling 3 omni is the first model in Pencil to support video reference inputs, as part of its guidance context for video generation, in addition to the text prompts and image inputs now standard in frontier video models.

That said, specific surgical edits of videos via text prompts, or using an existing video as an input for an agent-driven transformation or 'remix' are upcoming features and are not currently supported on the platform.

We are always working to add to our list of supported models. Note that the models available to you may vary depending on your workspace configuration.

Summary

Results can vary based on input format (text, image, or frame sequence), model selection, and prompt structure. We recommend testing a few different models to find the best fit for your intended tone, pacing, and level of realism.

When working from reference frames or brand-aligned visuals, model choice is just one factor to consider. Prompt structure, frame sequencing, and visual context often have equal or greater influence on the final result.

Cost

See: How are Generations Counted?