Descriptions:
Fahd Mirza demonstrates how to install and run GroundedAI, an open-source evaluation framework built to detect hallucinations and measure factual grounding in large language model outputs. The entire setup runs locally on Ubuntu using Ollama as the inference backend, with GLM 4.7 Flash serving as the generation model and GroundedAI’s own fine-tuned judge model โ roughly 8 GB in size, consuming around 7.6 GB of VRAM during inference โ scoring each response.
The demo walks through pip installation, cloning the GroundedAI GitHub repository, and running a progression of test cases: simple factual questions (capital of France), nonsensical prompts designed to induce hallucination (when did Mount Everest relocate to Australia?), and context-grounded questions about a fictional person. The judge model outputs a full chain-of-thought reasoning trace before assigning a numerical hallucination score and a faithful/unfaithful label, making it easy to audit why a particular response was flagged.
Mirza notes that GroundedAI also supports OpenAI and Anthropic models as alternative judges for teams that prefer API-based evaluation rather than running the model locally. The framework is particularly valuable for production RAG pipelines, where knowing whether a model is staying within its provided context versus confabulating is critical before deployment. The video includes enough hardware detail โ VRAM monitoring, model sizes, Ubuntu commands โ for developers to reproduce the setup on comparable hardware.
๐บ Source: Fahd Mirza ยท Published May 16, 2026
๐ท๏ธ Format: Tutorial Demo







