Skip to main content

How Do You Measure the Accuracy and Grounding of AI Responses?

Sarah Bradley avatar
Written by Sarah Bradley
Updated over 7 months ago

At Pencil, we measure the accuracy and grounding of AI-generated responses by using an optimised Retrieval-Augmented Generation (RAG) pipeline across various large language models (LLMs). This ensures that the responses are based on reliable, brand-specific information.

How We Measure Performance

We assess the performance of the brand library by asking a series of standardised chat-based questions. These questions help us test whether the AI can:

  1. Recall – Can the AI accurately retrieve relevant information from the documents uploaded to the brand library?

  2. Apply – Can the AI apply the correct facts from the brand documents to generate relevant, accurate responses?

    By using these benchmarks, we ensure that the AI responses are grounded in the correct data and align with the brand’s guidelines.

For more details or questions, don’t hesitate to contact our support team!

Did this answer your question?