I've been running an AI agent on my own infrastructure for a while now. No cloud APIs I don't control, no black-box services sending my data to places I can't audit. This is how I set it up, and why.
> Why Run Your Own Agent?
Most AI assistants live on someone else's server. Your prompts, your files, your conversations — all processed by APIs you have no visibility into. That never sat right with me.
Hermes Agent changes that. It's an open-source, self-hostable AI agent that I run inside a Docker container on my own machine. The model runs through Ollama Cloud — specifically GLM-5.1 — so inference happens remotely but the orchestration, memory, and tool execution stay local. Web search goes through my own SearXNG instance, also containerized. Nothing leaves my network unless I explicitly tell it to.
> The Stack
Here's what's actually running:
- Hermes Agent — the orchestrator. Handles conversations, memory, skill execution, cron jobs, and multi-step reasoning. Runs in Docker.
- GLM-5.1 via Ollama Cloud — the language model. Inference is offloaded to Ollama's cloud endpoint so I don't need a local GPU. The model handles reasoning, code generation, and all the agent's decision-making.
- SearXNG — a privacy-respecting meta-search engine. Hermes uses it for web search instead of hitting Google or Bing directly. Self-hosted, also in Docker, no tracking, no API keys leaking to third parties.
- Docker — everything runs containerized. Hermes, SearXNG, the Next.js portfolio you're reading right now — all isolated in their own networks. If something breaks or gets compromised, blast radius is contained.
> Why Containerized?
This isn't just about convenience. Security is the real reason.
When you run an AI agent, it has the ability to execute shell commands, read and write files, make HTTP requests, and interact with your system. That's extremely powerful — and extremely dangerous if something goes wrong.
By running everything inside Docker containers: - The agent can't access my host filesystem directly — only mounted volumes - Network access is scoped to what the container needs - If a model hallucination triggers a destructive command, the damage is limited to the container - I can rebuild the entire environment from a `docker-compose.yml` in minutes This is the same principle as sandboxing a browser. You give the agent enough power to be useful, but you draw hard lines around what it can touch.
> Ollama Cloud vs. Local Inference
I considered running models locally. The problem? My workstation doesn't have a GPU, and even if it did, running a capable model like GLM-5.1 locally requires serious VRAM. The trade-off isn't worth it for a personal agent that needs to be responsive.
Ollama Cloud gives me the best of both worlds: the model inference happens on remote GPUs, but the agent logic, memory, and tool execution never leave my machine. The only thing that travels over the wire is the prompt and the completion — no persistent data, no session storage on their end.
It's not fully air-gapped, and I'm honest about that. But for my threat model — personal productivity, not enterprise — the trade-off is acceptable. The sensitive part isn't the model output; it's what the agent can *do* with it.
> SearXNG for Search
Hermes needs web search for research tasks, blog writing, and real-time information. Instead of giving it a Google API key, I run my own SearXNG instance.
SearXNG aggregates results from multiple search engines without tracking queries or building user profiles. It's open-source, lightweight, and runs happily alongside Hermes in Docker. The agent queries it through its built-in web tool, gets clean JSON results, and processes them however the task requires.
No API keys. No rate limits I can't control. No search history being sold.
> What Hermes Actually Does
In practice, Hermes handles a lot of my daily workflow: - Writing and editing these blog posts (yes, this one too) - Managing scheduled jobs via cron — backups, health checks, monitoring - Web research with cited sources - Code generation and review across my projects - File management and document processing - Running multi-step research tasks that require search → read → synthesize loops The skill system is extensible — I can author new skills as markdown files and Hermes picks them up automatically. Each skill knows its own triggers, pitfalls, and tools. It's like having a well-trained intern who actually remembers what you taught them.
> Getting Started
If you want to set this up yourself, the process is straightforward: 1. Install Docker on your machine (Ubuntu 24.04 recommended) 2. Deploy Hermes Agent — pull the container, configure your model provider (I use Ollama Cloud with GLM-5.1) 3. Deploy SearXNG — another container, configure it as a search backend for Hermes 4. Configure tools and skills — Hermes comes with built-in skills and you can author your own 5. Connect your platforms — Telegram, Discord, or just the terminal The whole setup runs on a single VPS or even a home server. No Kubernetes, no microservices architecture, no overkill. Just containers doing their jobs.
> Final Thoughts
Running your own AI agent isn't about being paranoid — it's about being deliberate. You choose what model runs, where your data goes, and what the agent can access. Hermes, GLM-5.1, SearXNG, and Docker gave me that control without sacrificing capability.
If you're reading this on my site, it was built with Hermes' help, deployed in Docker, and served through Next.js. The agent wrote the code, I reviewed it, and the whole thing runs on infrastructure I control. That's the point.