Skip to main content

How can I generate images with consistent, accurate product shots?

When generating images that include real products, the product is the source of truth. Colors, labels, typography, and proportions all need to stay consistent and on-brand. Use these six principles to help you achieve consistent product shots in Pencil.

Written by Michael Whyle
Updated over a week ago

Set Your Resolution to 4K From the Start

Generate at 4K from the beginning, not as a fixer or upscaling downstream step.

Resolution sets the ceiling for everything downstream, and dropping lower then upscaling rarely recovers what's lost.

Select Nano Banana Pro as Your Model

When choosing a model, select Nano Banana Pro, which is the current state of-the-art model for product fidelity, as of February 2026.

It's specifically optimised for generative image quality and product fidelity. For pack shot, and product shoot work, it's the right tool for the job.

Use a clean reference image

Upload a high-resolution shot of the actual product directly into the chat, and explicitly instruct the agent to reference it. Give the model the visual source material it needs to work with upfront: A high resolution picture of a clear, isolated product on a plain matte background, no motion blur, and good lighting with no glare or reflections.

Mirror your prompt to the reference

Though this may seem counter-intuitive and redundant, it really helps to 'double up' and - in addition to the image reference described above - use the text prompt to describe the product in terms that match what's visible in the reference image.

Specificity helps with this. Vague descriptors like "a shampoo bottle," or "a red can" give the model too much room for interpretive drift which. Something like, "a tall, matte-black cylindrical bottle with a gold pump dispenser and white serif lettering reading 'RACELLE SERUM' across the centre," works much better.

If you're not sure how to describe your product accurately, you can always give the image to another AI agent or LLM chatbot and ask it to write a detailed visual description for you.

Expect small text to be imperfect

Fine print (ingredients, barcodes, regulatory text) is a known limitation at any resolution. Bold headline text survives 4K well; dense label copy often doesn't. If legible fine print is a hard requirement, compositing that element separately is the more reliable route.

Multiple products need multiple image references and descriptions

Each SKU needs its own dedicated reference image. A combined shot or collage spreads the model's attention too thin. Upload each product separately and reference them individually in your prompt, describing each one, and your desired placement for it.


Quick reference

What to do

Why

Generate at 4K

Sets the quality ceiling from the start

Use Nano Banana Pro

Optimised for product fidelity and generative image quality

Upload a clean reference image

Gives the model a visual anchor, not just a description

Mirror prompt language to the product

Prevents drift when model and reference alone aren't enough

Accept small text limitations

Plan compositing if fine print is critical

One reference per SKU

Maintains fidelity across multi-product scenes

Did this answer your question?