Run OpenAI’s Internal PII Detection Model Locally – Privacy Filter Setup & Demo

Tutorials2 weeks ago

Run OpenAI’s Internal PII Detection Model Locally – Privacy Filter Setup & Demo

Descriptions:

OpenAI has open-sourced Privacy Filter, the PII detection model they built and deployed internally to sanitize data before it enters their own systems. Released under an Apache 2.0 license and hosted on Hugging Face, the model is now available for anyone to run locally — a notable departure for a lab that keeps most of its core tooling closed. In this hands-on walkthrough, Fahd Mirza installs and runs Privacy Filter on an Ubuntu machine with an NVIDIA RTX A6000, measuring just over 3GB of VRAM consumption — well within reach of consumer hardware or even a CPU with sufficient RAM.

The video covers both high-level pipeline usage and low-level token scoring across 33 PII label classes, using a BIOES tagging system (Begin, Inside, End, Single) to mark entity spans at the token level. Mirza demonstrates how production teams can control detection thresholds — requiring 0.99 confidence for medical data pipelines versus 0.85 for log sanitization — rather than relying on the pipeline’s default decisions. A context-aware detection demo shows the model correctly distinguishing a personal phone number from a company hotline or a doctor’s office number, something regex tools cannot reliably do.

The walkthrough concludes with a deployable redaction function that merges tokenizer-split spans and replaces them with placeholders from end to start (preserving string positions), giving viewers production-ready code for integrating PII filtering into any enterprise AI pipeline.

📺 Source: Fahd Mirza · Published May 02, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

Companies

No Image Available

OpenAI

Tags

Nvidia OpenAI Transformers

Prev

Apple Forecasts Sales Growth Amid Memory Shortage | Bloomberg Tech 5/1/2026

Next

I Tried 100+ Claude Code Skills. These 6 Are The Best

18 Related Posts

Related Posts

14:22

Tutorials

Codex Mobile Released and It’s Insane

9 minutes ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

1 day ago

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

1 day ago

03:02

Tutorials

Installing Claude Code

1 day ago

08:41

Tutorials

Luce DFlash Meets OpenClaw – Local AI Agents at 2x Speed with Qwen3.6-27B

2 days ago

24:07

Tutorials

Hermes Agent powered by local models on the DGX Spark is basically magic

2 days ago