AgentMemory + Hermes Agent + Ollama = AI Agent That Never Forgets | Fully Local Setup

AgentMemory + Hermes Agent + Ollama = AI Agent That Never Forgets | Fully Local Setup

More

Descriptions:

Fahd Mirza demonstrates a fully local setup that gives AI coding agents persistent memory across sessions, combining the AgentMemory tool, the Hermes agent framework, and Ollama running Qwen 3.6 — all on an Ubuntu system with an Nvidia RTX 5600 GPU (48GB VRAM), with no cloud API required.

AgentMemory is built on what its developers call the Triple-I Engine, a four-tier memory architecture modeled on human cognition: raw observations from every tool call get compressed into episodic summaries, then into semantic facts, and finally into procedural patterns. Retrieval combines BM25 keyword search, vector similarity, and a knowledge graph in a triple-stream system that the project claims achieves 95.2% accuracy on the LongMemEval benchmark. The video walks through the full installation process, Hermes configuration (editing config.yml at line 330 to register AgentMemory as both a memory provider and an MCP server with 43 available tools), and a live browser dashboard on port 3113 that shows memories accumulating in real time as the agent works.

Mirza flags one friction point: the Hermes setup wizard no longer lists Ollama as a named provider, requiring users to select “Custom Direct API” and manually enter the Ollama-compatible OpenAI endpoint. He also notes that the default BM25-only mode skips LLM-based compression — functional for demos but less capable than the full semantic pipeline recommended for production use with OpenAI, Anthropic, or OpenRouter-hosted models.


📺 Source: Fahd Mirza · Published May 26, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels