Descriptions:
David Ondrej walks through the rationale, setup, and practical use of uncensored large language models running locally in 2026. The video opens with a technical explanation of how commercial models like ChatGPT and Claude refuse certain queries — not through hidden system prompt filters, but through training-time alignment techniques including RLHF and fine-tuning — and why this creates genuine friction for legitimate use cases such as cybersecurity red-teaming, medical documentation, legal research, and creative writing.
The practical portion centers on downloading and running uncensored model variants via Ollama, pulling GGUF-quantized files directly from Hugging Face. The primary demonstration uses Super Gemma 4 Uncensored 26B V2, which runs at approximately 200 tokens per second on a 128GB MacBook. Ondrej also explains how to navigate Hugging Face’s model hub to find quantized uncensored variants — noting 179 different quantized versions of Gemma 4 26B alone — and highlights contributors like “Pliny the Liberator” who specialize in uncensored fine-tunes. Smaller options such as Gemma 4 4B are mentioned for users with less VRAM.
The video additionally introduces a GitHub repository Ondrej built — inspired by Andrej Karpathy’s auto-research concept — that systematically tests prompting strategies against commercial AI models to reduce refusals. Viewers interested in self-hosted, privacy-preserving LLM deployments will find the Ollama workflow and Hugging Face navigation guidance immediately actionable, though the jailbreaking automation tool carries platform terms-of-service implications for commercial API use.
📺 Source: David Ondrej · Published May 11, 2026
🏷️ Format: Tutorial Demo







