This 100% uncensored AI model is insane… let’s run it

Tutorials4 days ago

This 100% uncensored AI model is insane… let’s run it

Descriptions:

David Ondrej walks through the rationale, setup, and practical use of uncensored large language models running locally in 2026. The video opens with a technical explanation of how commercial models like ChatGPT and Claude refuse certain queries — not through hidden system prompt filters, but through training-time alignment techniques including RLHF and fine-tuning — and why this creates genuine friction for legitimate use cases such as cybersecurity red-teaming, medical documentation, legal research, and creative writing.

The practical portion centers on downloading and running uncensored model variants via Ollama, pulling GGUF-quantized files directly from Hugging Face. The primary demonstration uses Super Gemma 4 Uncensored 26B V2, which runs at approximately 200 tokens per second on a 128GB MacBook. Ondrej also explains how to navigate Hugging Face’s model hub to find quantized uncensored variants — noting 179 different quantized versions of Gemma 4 26B alone — and highlights contributors like “Pliny the Liberator” who specialize in uncensored fine-tunes. Smaller options such as Gemma 4 4B are mentioned for users with less VRAM.

The video additionally introduces a GitHub repository Ondrej built — inspired by Andrej Karpathy’s auto-research concept — that systematically tests prompting strategies against commercial AI models to reduce refusals. Viewers interested in self-hosted, privacy-preserving LLM deployments will find the Ollama workflow and Hugging Face navigation guidance immediately actionable, though the jailbreaking automation tool carries platform terms-of-service implications for commercial API use.

📺 Source: David Ondrej · Published May 11, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

David Ondrej

Tags

Andrej Karpathy Anthropic Claude Claude Code Gemma 4 llama.cpp Ollama Open Router OpenAI

Prev

Two Roads to Durable Agents: Replay vs. Snapshot — Eric Allam, Trigger.dev

Next

TurboQuant + DFlash: Supercharge Local LLM Speed

18 Related Posts

Related Posts

14:38

Tutorials

Using HiDream-O1 Natively in ComfyUI

1 hour ago

14:22

Tutorials

Codex Mobile Released and It’s Insane

1 hour ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

1 day ago

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

1 day ago

03:02

Tutorials

Installing Claude Code

1 day ago

08:41

Tutorials

Luce DFlash Meets OpenClaw – Local AI Agents at 2x Speed with Qwen3.6-27B

2 days ago