agent-os — your personal agent OS, governed by default

What agent-os is (and isn't)

It isn't the biggest pile of connectors or the flashiest chat UI. agent-os is the orchestration + evaluation + controlled-autonomy + personal-brain spine for your own agents — the layer that makes automation trustworthy.

Trustworthy

Every action is accountable

Each run is recorded as a trace, scored by an evaluation gate, written to a tamper-evident audit log, and — if weak — turned into an improvement proposal.

Private

Local-first, your model

SQLite + the standard library at the core. Bring Ollama (free, local), OpenAI, or Claude with one env var. No bundled keys, no hidden network calls.

Safe

Default-deny autonomy

Read-only tasks auto-run; anything that writes, sends, or deploys — or is ambiguous — is gated for your approval. Nothing privileged slips through.

Anatomy of agent-os

One layered picture of the whole system. Your transports sit on top; your model plugs in at the bottom; in between, a governed harness turns every request into a traced, scored, risk-gated action. You build the unique top and bottom — agent-os is the trustworthy middle.

You — what you build

Your transports & workflows CLI · local Web UI · chat (WhatsApp/Slack adapters) · your domain tasks

↕

The spine — one governed entry point

Command Router transport-agnostic · audited on every call · returns plain text

↓

The governed harness

Capabilities every action: traced → scored → risk-gated → improved

🧠 The Brain

Ingest notes & files
BM25 + semantic retrieval
Ground & cite answers

🧩 Skills & Profiles

Open SKILL.md import
researcher · operator · builder · qa
Prompt injection per task

🐝 The Swarm

Decompose a goal
Bounded-parallel sub-jobs
Synthesize one deliverable

🛡️ Governance

Risk gate (default-deny)
Approvals queue
Hash-chained audit
Eval gate (Ninja Harness)
Hooks (secret redaction)
Cost & token metering
Persistent memory

↓

Models — bring your own (one env var)

Ollamalocal & free · no API key

OpenAIgpt-4o-mini & others

AnthropicClaude models

Echodeterministic · offline default

Read it top-to-bottom: a request enters through any transport, the Command Router authorizes and audits it, the harness runs it through the capabilities under the governance spine, and a model you chose does the thinking — even a free local one. Swap any layer; the guarantees in the middle never move.

Quickstart — one command

Creates a local virtualenv (no sudo, no global state), installs the evaluation gate + agent-os, and tells you exactly what to run next.

# install (or run ./install.sh from a clone)
curl -fsSL https://raw.githubusercontent.com/gagans23/agent-os/main/install.sh | bash

# which model can my machine run?
agent-os doctor
# → detects RAM / Apple-Silicon / NVIDIA VRAM + Ollama, recommends a model

# plug a local, free model (no API key)
export AGENT_OS_PROVIDER=ollama:llama3.1:8b

# open the local web UI — click a button
agent-os ui      # http://127.0.0.1:8765 (auto-picks a free port if busy)

Prefer the terminal? agent-os cmd "/help" lists every command. Teach it something: agent-os cmd "/learn ~/notes.md" then agent-os cmd "/ask what did I learn?"

How it works — the governed loop

Every command — from the CLI, the web UI, or a chat transport — flows through the same router and the same guarantees.

CommandCLI · UI · chat

→

Router+ your context

→

Risk gatedefault-deny

→

Executeyour model

→

Tracefull trajectory

→

ScoreNinja Harness

→

Improvepropose-only

Write / send / deploy tasks branch off the risk gate into an approval queue (/approve · /reject) — they never auto-run. Every step is appended to a hash-chained audit log you can verify for tampering.

Built module by module

Never boiling the ocean. Each module is local-first and sits behind the same traced → scored → gated spine.

✅ Module 0 — Trust & Governance

Persistent jobs, memory, traces, agent profiles, skills; a default-deny + tool-aware risk classifier; an approval queue; a tamper-evident hash-chained audit log; the evaluation gate; supervisor / health / reliability.

✅ Module 1 — The Brain 🧠

A local knowledge base your agents retrieve from. /learn your notes/files; /ask answers only from your context and is scored for grounding. Hybrid keyword + semantic search when an embedder is configured.

✅ Module 2 — Model onboarding 🧩

One env var (AGENT_OS_PROVIDER) plugs in Ollama / OpenAI / Claude — stdlib HTTP, no SDK. Powers answers, synthesis, and the Brain's embeddings. Plus agent-os doctor, a hardware-aware model advisor.

✅ Module 3 — Easy install + UI 🖥️

A one-command installer and a minimal local web UI (agent-os ui) on the standard-library server — nothing extra to install, localhost-only, driving the same governed router.

🚧 Module 4 — Pro-coder + connectors

✅ Compatible with the open SKILL.md standard (import any skills folder via AGENT_OS_SKILLS_PATH). Next: an MCP connector bridge, curated role packs, knowledge-graph import, and coding-agent links.

✅ Module 6 — The governed swarm 🐝

One goal → decompose → run sub-tasks in bounded parallel → synthesize one deliverable. Each sub-task is a real, traced, risk-gated, scored job. Honest concurrency, your model, your machine.

Module 5 — watchers & dashboards (folder/event watchers, trend dashboards, a knowledge-graph view of the Brain) — is next on the roadmap.

Feature deep-dive

🧠 The Brain — your own context

Upload notes, files, or whole folders; agents retrieve from them and answer grounded in your material, not a model's guess. Answers are scored against the source, so ungrounded ones get flagged.

agent-os cmd "/learn To add fractions with the same denominator, add the numerators."
agent-os cmd "/ask how do I add fractions?"
→ Based on your notes: ...                         [PASS · grounding 0.75]

🐝 The governed swarm — parallel, but verified

The decompose → parallel → synthesize pattern, placed under the trust spine. Privileged sub-tasks are gated, never auto-run; the synthesis is scored too.

agent-os swarm "research the top 5 local LLM runtimes; compare license, RAM, speed in a table"
🐝 3 sub-tasks · 2 done · 1 gated · 0 failed
   - [PASS 89] summarize ...        - [GATED:WRITE] delete the prod database
   Synthesis scored 88.8

🩺 doctor — which model can my machine run?

Detects your hardware and recommends the largest local model that comfortably fits, with the exact one-liner to enable it.

agent-os doctor
Machine : Apple Silicon · 16 GB · Metal
✅ Recommended: llama3.1:8b
   export AGENT_OS_PROVIDER=ollama:llama3.1:8b

🛡️ Trust & Governance — tamper-evident by default

Every command is hash-chained into an audit log; any edit or deletion breaks the chain and is detectable. The risk classifier is default-deny and tool-aware. A global error boundary means you never see a raw stack trace.

agent-os cmd "/risk make the prod table empty"   # → WRITE → REQUIRES APPROVAL
agent-os cmd "/audit"                            # → chain ✅ intact

🧩 Skills — incl. the open SKILL.md standard

Reusable SKILL.md procedures the agent matches and injects into your model's prompt. Compatible with the open Agent Skills format, so you can point at any skills folder and import it with no code — and it runs on whatever model you've configured.

export AGENT_OS_SKILLS_PATH="/path/to/any/skills"
agent-os cmd "/skills"

Command surface

One transport-agnostic command set — the same from the CLI, the web UI, and (later) chat.

Command	What it does
`/ping` · `/status` · `/health`	liveness, recent jobs, detailed health checks
`/learn <path\|text>`	ingest notes/files into the Brain
`/ask <question>`	answer from your knowledge base (grounded + scored)
`/run <task>`	read-only auto-runs; write/send/deploy is gated for approval
`/swarm <goal>`	decompose → parallel sub-jobs → synthesize one deliverable
`/doctor` · `/model`	recommend a local model · show the configured provider
`/cost`	cost · latency · token usage rolled up across recent runs
`/risk <task>`	show the risk classification for a task
`/pending` · `/approve <id>` · `/reject <id>`	the approval queue for privileged actions
`/audit`	recent audit entries + chain integrity
`/agents` · `/skills` · `/eval`	profiles · skills · run the evaluation suite
`/job <id>` · `/trace <id>`	inspect a persisted job and its trajectory + score
`/digest`	synthesize a cross-episode insight digest

CLI verbs: run, cmd, ui, doctor, swarm, skills, memory, health, supervise, daily-eval.

Your personal agent OS,
governed by default.

What agent-os is (and isn't)

Every action is accountable

Local-first, your model

Default-deny autonomy

Anatomy of agent-os

🧠 The Brain

🧩 Skills & Profiles

🐝 The Swarm

🛡️ Governance

Quickstart — one command

How it works — the governed loop

Built module by module

Feature deep-dive

🧠 The Brain — your own context

🐝 The governed swarm — parallel, but verified

🩺 doctor — which model can my machine run?

🛡️ Trust & Governance — tamper-evident by default

🧩 Skills — incl. the open SKILL.md standard

Command surface

Principles

Go deeper

Architecture

The Brain

Model onboarding

The governed swarm

Skills

Install + UI

Roadmap

Security

Engineering