Skip to main content
Overview

I Built an AI Agent That Manages My Homelab

Merox
Merox HPC Sysadmin
7 min read
AI Intermediate

At 2am when a pod crashes, I don’t want to open a laptop. I want to send a message to Telegram and get an answer. That’s what merox-agent does — it’s an AI agent that runs on my VPS, knows my entire infrastructure, and can act on it.

The code is on GitHub: github.com/meroxdotdev/merox-agent

It’s backed by the Claude Agent SDK and uses my Claude Code account. No separate API key, no monthly bill per token.


What it can do

CommandResult
what pods are down?Runs kubectl get pods -A, filters not-running
show me traefik logsdocker logs traefik --tail 50
reconcile flux-systemflux reconcile kustomization flux-system
how much disk is left?Disk, memory, CPU load from the VPS
what changed in the infra repo?git log --oneline -10
restart the n8n containerdocker restart n8n (asks to confirm first)

It responds in whichever language you write — Romanian or English.

Infrastructure status overview from Telegram
Infrastructure status
Jellyfin logs check from Telegram
Jellyfin logs
Confirmation before destructive operation
Confirmation before destructive ops

Architecture

Phone / Laptop
├── Telegram (@meroxagentbot)
└── client.py ──────────────► HTTP (Tailscale)
service.py (FastAPI)
Claude Agent SDK (Python)
Claude Code CLI
┌────────────┴────────────┐
kubectl docker
talosctl git
flux systemctl

The key piece is the Claude Agent SDK — it runs Claude Code as a subprocess, injects a system prompt with full infrastructure context, and exposes tools (kubectl, docker, git, etc.) that Claude can call. The FastAPI server wraps this in an HTTP + SSE API, and the Telegram bot is wired to the same endpoint.

Claude Code authenticates via OAuth (your Claude account), so there’s no API key to manage on the server.


Prerequisites

  • A VPS with Ubuntu 22.04+ (I use Oracle Cloud Free Tier)
  • Tailscale running on the server and your devices
  • kubectl, flux, talosctl installed if you have a Kubernetes cluster
  • A Telegram bot token from @BotFather
  • A Claude account (Pro or Team — needed for Claude Code)
  • Python 3.11+, Node.js 20+

Setup

1. Install Claude Code

Terminal window
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
sudo npm install -g @anthropic-ai/claude-code
# Authenticate (opens browser OAuth)
claude
Note

Claude Code needs to be authenticated as the user that will run the agent. Do this before running install.sh.

2. Clone and configure

github.com/meroxdotdev/merox-agent
sudo git clone https://github.com/meroxdotdev/merox-agent /srv/merox-agent
cd /srv/merox-agent
sudo cp .env.example .env
sudo nano .env

Fill in:

Terminal window
SERVER_NAME=my-vps
SERVER_TS_IP=100.x.x.x # tailscale ip -4
INFRA_REPO=/srv/kubernetes/infrastructure
WEBSITE_REPO=/srv/merox
TELEGRAM_BOT_TOKEN=123456:ABC...
TELEGRAM_USER_ID=123456789 # get from @userinfobot
Warning

TELEGRAM_USER_ID is the whitelist — only messages from this ID are processed. Don’t skip it.

3. Install

Terminal window
sudo bash install.sh

This creates a Python virtualenv, installs dependencies, writes the systemd service, and starts it.

4. Verify

Terminal window
systemctl status merox-agent
journalctl -u merox-agent -f

Then message your bot on Telegram. It should respond within a few seconds.


How it works

The system prompt (prompt.py) is what makes this actually useful. It tells Claude exactly what infrastructure exists: server IPs, which Docker containers are running, where the Kubernetes cluster config lives, what apps are deployed. Without this context, it would be a generic assistant. With it, it knows your specific setup.

# prompt.py (simplified)
SYSTEM_PROMPT = f"""
You are an infrastructure agent for merox.dev.
Server: {config.SERVER_NAME} ({config.SERVER_TS_IP})
Docker services: traefik, pihole, portainer, garage, netdata, homepage
Kubernetes: Talos OS, 3 nodes, Cilium CNI, Longhorn storage, FluxCD v2
Apps: jellyfin, radarr, sonarr, n8n, grafana, prometheus, loki...
Environment:
KUBECONFIG={config.KUBECONFIG}
TALOSCONFIG={config.TALOSCONFIG}
Rules:
- Never read: age.key, *.sops.yaml, kubeconfig, .env files
- Show plan before destructive commands
- Prefer GitOps over direct kubectl apply
"""

Tool calls flow like this: Claude decides it needs to run kubectl get pods -A, calls the run_kubectl tool, gets the output, and formats a response. All of this happens inside the Claude Agent SDK’s query() loop.

# service.py (simplified)
async def run_agent(message: str, session_id: str):
options = ClaudeAgentOptions(
system_prompt=SYSTEM_PROMPT,
allowed_tools=ALL_TOOLS,
permission_mode="default",
model="claude-opus-4-6",
resume=session_id, # maintains conversation context
)
async for event in query(prompt=message, options=options):
yield event

Sessions persist across messages using resume=session_id. Send /clear in Telegram to reset.


Connecting from your laptop

Terminal window
pip install httpx
git clone https://github.com/meroxdotdev/merox-agent # github.com/meroxdotdev/merox-agent
cd merox-agent
echo "AGENT_SERVER_URL=http://<SERVER_TS_IP>:8765" > .env
python3 client.py

Or just use the Telegram bot — no setup needed.


Security

The agent runs over Tailscale only — port 8765 is not exposed publicly. A few other guardrails:

  • File blocklistage.key, *.sops.yaml, kubeconfig, id_rsa, .env can’t be read or written by the agent
  • Dangerous command blocking — patterns like rm -rf /, mkfs, dd if= are rejected at the tool level
  • Telegram whitelist — only your TELEGRAM_USER_ID can interact with the bot
  • Confirmation for destructive ops — the system prompt instructs Claude to show a plan and wait for confirmation before stopping critical services
Warning

The Claude Code CLI permissions file at .claude/settings.json grants Bash/Read/Write access broadly. This is intentional — the agent needs it to operate. The security model relies on Tailscale + the bot whitelist, not on restricting Claude’s tools.


What to back up

ItemLocationCritical?
Agent codeGitHub
Server .env/srv/merox-agent/.env⚠️ Manual backup
AGE key/srv/kubernetes/infrastructure/age.key⚠️ Manual backup
K8s manifestsGitHub (SOPS-encrypted)

If you lose the server, rebuilding takes about 5 minutes: clone, fill .env, run install.sh, authenticate Claude Code. Everything else is in git.


Dependencies

claude-agent-sdk # Claude Agent SDK
fastapi # HTTP server
uvicorn # ASGI runner
python-telegram-bot # Telegram integration

That’s it. Four packages.


Turns out the hardest part of building this was writing the system prompt — getting Claude to understand the full infrastructure context and behave predictably (confirm before destructive ops, prefer GitOps, never touch secrets). The SDK and Telegram wiring were straightforward by comparison.

If you have a homelab with more than a handful of services, the “open laptop, SSH, run command” loop gets old fast. This replaced most of it.


Why not OpenClaw?

OpenClaw is worth knowing about. It’s a general-purpose personal AI assistant — runs on your machine, connects to WhatsApp/Telegram/Discord/Slack, has persistent memory, browser control, a community skill ecosystem, and supports multiple models including local ones. It’s genuinely impressive and moving fast.

I actually wrote a separate guide on deploying OpenClaw on Proxmox — including the security hardening you need before exposing it to anything real.

So why build something custom instead?

merox-agentOpenClaw
PurposeInfrastructure onlyGeneral purpose
MemorySession-basedPersistent across sessions
Chat appsTelegramWhatsApp, Telegram, Discord, Slack, iMessage
Browser
ModelsClaude onlyClaude, GPT, local
CostClaude Code account (no per-token billing)API key required (pay per token)
Skillskubectl, docker, flux, git baked inCommunity skills + self-extending
ProactiveReactive onlyHeartbeats, cron jobs
Setupinstall.sh, 4 packagesMore complex
Security guardrailsBaked in (file blocklist, confirm destructive)Manual configuration

Honest answer: if you want a general AI assistant that also does some homelab stuff, OpenClaw is probably the better choice. More capable, more surface area, active community building new skills constantly.

merox-agent made sense for me because I wanted one thing done well — infra management with specific guardrails (never touch age.key, prefer GitOps over direct apply, always confirm before restarting critical services). OpenClaw can do this too, but you’d need to wire it yourself. I wanted it working in an hour with no ongoing configuration overhead.

If you’re starting from scratch today and want more than just infra control, look at OpenClaw first.


Full source: github.com/meroxdotdev/merox-agent

Share this post

Related Posts

Proxmox GPU Passthrough Guide

Step-by-step GPU passthrough on Proxmox VE 8 — IOMMU groups, VFIO binding, NVIDIA driver quirks, ROM dumps, and Tesla P40 for AI inference workloads.

5 min read AI

Deploying OpenClaw on Proxmox

How to deploy OpenClaw (formerly Moltbot/Clawdbot) on Proxmox — installation, configuration, security hardening, and what to watch out for after the Moltbook database breach.

6 min read AI

How to Set Up Your Own AI at Home

Running Ollama, OpenWebUI, and Stable Diffusion on a CPU-only homelab server — installation, configuration, and integration.

3 min read AI
Loading comments...