Needle: Finetune a 26M Tool-Calling Model Locally with Ollama

Tutorials20 hours ago

Needle: Finetune a 26M Tool-Calling Model Locally with Ollama

Descriptions:

Needle is a 26-million-parameter encoder-decoder transformer with one job: given a natural language request and a list of available tools, output the correct tool name and arguments as structured JSON. In this tutorial, Fahd Mirza walks through installing Needle, replacing its default Gemini-based synthetic data generator with a fully local Ollama model, and fine-tuning the result on a custom dataset — entirely offline on a single NVIDIA RTX A6000 GPU.

The video includes a clear architectural breakdown of how Needle works. An encoder stack of 12 layers reads the full input query using self-attention with grouped query attention (GQA) and RoPE positional encoding — deliberately stripped of the standard feed-forward block to stay compact. A separate eight-layer decoder generates the tool call token by token via masked self-attention, then cross-attends back to the encoder’s representation through a bridge layer before emitting structured JSON output. Weights, training code, and the data pipeline are all released under an MIT license, and the model is explicitly sized for on-device deployment on phones, watches, and glasses.

Mirza demonstrates generating 432 training examples covering three tools — get_weather, set_timer, and toggle_lights — with roughly 130 natural-language phrasings each plus 14 negative examples where no tool applies. The fine-tuned model is then validated against a held-out split saved to a checkpoint folder. The result is a reproducible pipeline for training a tiny, locally-runnable function-calling model without any cloud dependencies or proprietary API keys.

📺 Source: Fahd Mirza · Published July 03, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

People

No Image Available

Fahd Mirza

Tags

Fahd Mirza Gemini Ollama Qwen

Prev

AMAZING Krea-2 Reference Image Options PLUS Extra Detailing!

Next

800+ hours of Learning Claude Code in 8 minutes (2026 tutorial / unknown tricks / newest model)

800+ hours of Learning Claude Code in 8 minutes (2026 tutorial / unknown tricks / newest model)

18 Related Posts

Related Posts

10:25

Tutorials

Krea2 Has No Good Reference Mode. LoRA Is the Fix|From Dataset to Turbo Output

20 hours ago

11:53

Tutorials

You’re Not Behind (Yet): Master Hermes In 12 Minutes

20 hours ago

08:18

Tutorials

Claude Code Artifacts Are Here (No Backend!)

20 hours ago

14:35

Tutorials

Fable 5 + Karpathy’s LLM Wiki is Basically Cheating

20 hours ago

19:38

Tutorials

Finally, an Open Standard for the Karpathy LLM Wiki is HERE

2 days ago

14:19

Tutorials

This Skill Instantly 10x’es Every Claude Output

2 days ago