How can I generate images with consistent, accurate product shots?

Set Your Resolution to 4K From the Start

Generate at 4K from the beginning, not as a fixer or upscaling downstream step.

Resolution sets the ceiling for everything downstream, and dropping lower then upscaling rarely recovers what's lost.

Select Nano Banana Pro as Your Model

When choosing a model, select Nano Banana Pro, which is the current state of-the-art model for product fidelity, as of February 2026.

It's specifically optimised for generative image quality and product fidelity. For pack shot, and product shoot work, it's the right tool for the job.

Use a clean reference image

Upload a high-resolution shot of the actual product directly into the chat, and explicitly instruct the agent to reference it. Give the model the visual source material it needs to work with upfront: A high resolution picture of a clear, isolated product on a plain matte background, no motion blur, and good lighting with no glare or reflections.

Mirror your prompt to the reference

Though this may seem counter-intuitive and redundant, it really helps to 'double up' and - in addition to the image reference described above - use the text prompt to describe the product in terms that match what's visible in the reference image.

Specificity helps with this. Vague descriptors like "a shampoo bottle," or "a red can" give the model too much room for interpretive drift which. Something like, "a tall, matte-black cylindrical bottle with a gold pump dispenser and white serif lettering reading 'RACELLE SERUM' across the centre," works much better.

If you're not sure how to describe your product accurately, you can always give the image to another AI agent or LLM chatbot and ask it to write a detailed visual description for you.

Expect small text to be imperfect

Fine print (ingredients, barcodes, regulatory text) is a known limitation at any resolution. Bold headline text survives 4K well; dense label copy often doesn't. If legible fine print is a hard requirement, compositing that element separately is the more reliable route.

Multiple products need multiple image references and descriptions

Each SKU needs its own dedicated reference image. A combined shot or collage spreads the model's attention too thin. Upload each product separately and reference them individually in your prompt, describing each one, and your desired placement for it.

Quick reference

What to do	Why
Generate at 4K	Sets the quality ceiling from the start
Use Nano Banana Pro	Optimised for product fidelity and generative image quality
Upload a clean reference image	Gives the model a visual anchor, not just a description
Mirror prompt language to the product	Prevents drift when model and reference alone aren't enough
Accept small text limitations	Plan compositing if fine print is critical
One reference per SKU	Maintains fidelity across multi-product scenes

What camera shots and angles can I use in prompts?

How can I prompt for different types of movement in video generation?

How can I identify the best references for a Product LoRA?

How Can I use Seeds for Consistency in Image Generation Output?