Click a button and agents come up and do the work โ research, triage, drafting, monitoring โ grounded in your own context, with every action traced, scored, risk-gated, and improved. Runs on your machine, on your model.
A non-technical person should be able to install it; a pro coder should be able to augment their work with it.
It isn't the biggest pile of connectors or the flashiest chat UI. agent-os is the orchestration + evaluation + controlled-autonomy + personal-brain spine for your own agents โ the layer that makes automation trustworthy.
Each run is recorded as a trace, scored by an evaluation gate, written to a tamper-evident audit log, and โ if weak โ turned into an improvement proposal.
SQLite + the standard library at the core. Bring Ollama (free, local), OpenAI, or Claude with one env var. No bundled keys, no hidden network calls.
Read-only tasks auto-run; anything that writes, sends, or deploys โ or is ambiguous โ is gated for your approval. Nothing privileged slips through.
One layered picture of the whole system. Your transports sit on top; your model plugs in at the bottom; in between, a governed harness turns every request into a traced, scored, risk-gated action. You build the unique top and bottom โ agent-os is the trustworthy middle.
SKILL.md importRead it top-to-bottom: a request enters through any transport, the Command Router authorizes and audits it, the harness runs it through the capabilities under the governance spine, and a model you chose does the thinking โ even a free local one. Swap any layer; the guarantees in the middle never move.
Creates a local virtualenv (no sudo, no global state), installs the evaluation gate + agent-os, and tells you exactly what to run next.
# install (or run ./install.sh from a clone) curl -fsSL https://raw.githubusercontent.com/gagans23/agent-os/main/install.sh | bash # which model can my machine run? agent-os doctor # โ detects RAM / Apple-Silicon / NVIDIA VRAM + Ollama, recommends a model # plug a local, free model (no API key) export AGENT_OS_PROVIDER=ollama:llama3.1:8b # open the local web UI โ click a button agent-os ui # http://127.0.0.1:8765 (auto-picks a free port if busy)
Prefer the terminal? agent-os cmd "/help" lists every command.
Teach it something: agent-os cmd "/learn ~/notes.md" then agent-os cmd "/ask what did I learn?"
Every command โ from the CLI, the web UI, or a chat transport โ flows through the same router and the same guarantees.
Write / send / deploy tasks branch off the risk gate into an
approval queue (/approve ยท /reject) โ they never auto-run. Every step is appended to a
hash-chained audit log you can verify for tampering.
Never boiling the ocean. Each module is local-first and sits behind the same traced โ scored โ gated spine.
Persistent jobs, memory, traces, agent profiles, skills; a default-deny + tool-aware risk classifier; an approval queue; a tamper-evident hash-chained audit log; the evaluation gate; supervisor / health / reliability.
A local knowledge base your agents retrieve from. /learn your notes/files; /ask answers only from your context and is scored for grounding. Hybrid keyword + semantic search when an embedder is configured.
One env var (AGENT_OS_PROVIDER) plugs in Ollama / OpenAI / Claude โ stdlib HTTP, no SDK. Powers answers, synthesis, and the Brain's embeddings. Plus agent-os doctor, a hardware-aware model advisor.
A one-command installer and a minimal local web UI (agent-os ui) on the standard-library server โ nothing extra to install, localhost-only, driving the same governed router.
โ
Compatible with the open SKILL.md standard (import any skills folder via AGENT_OS_SKILLS_PATH). Next: an MCP connector bridge, curated role packs, knowledge-graph import, and coding-agent links.
One goal โ decompose โ run sub-tasks in bounded parallel โ synthesize one deliverable. Each sub-task is a real, traced, risk-gated, scored job. Honest concurrency, your model, your machine.
Module 5 โ watchers & dashboards (folder/event watchers, trend dashboards, a knowledge-graph view of the Brain) โ is next on the roadmap.
Upload notes, files, or whole folders; agents retrieve from them and answer grounded in your material, not a model's guess. Answers are scored against the source, so ungrounded ones get flagged.
agent-os cmd "/learn To add fractions with the same denominator, add the numerators." agent-os cmd "/ask how do I add fractions?" โ Based on your notes: ... [PASS ยท grounding 0.75]
The decompose โ parallel โ synthesize pattern, placed under the trust spine. Privileged sub-tasks are gated, never auto-run; the synthesis is scored too.
agent-os swarm "research the top 5 local LLM runtimes; compare license, RAM, speed in a table" ๐ 3 sub-tasks ยท 2 done ยท 1 gated ยท 0 failed - [PASS 89] summarize ... - [GATED:WRITE] delete the prod database Synthesis scored 88.8
Detects your hardware and recommends the largest local model that comfortably fits, with the exact one-liner to enable it.
agent-os doctor Machine : Apple Silicon ยท 16 GB ยท Metal โ Recommended: llama3.1:8b export AGENT_OS_PROVIDER=ollama:llama3.1:8b
Every command is hash-chained into an audit log; any edit or deletion breaks the chain and is detectable. The risk classifier is default-deny and tool-aware. A global error boundary means you never see a raw stack trace.
agent-os cmd "/risk make the prod table empty" # โ WRITE โ REQUIRES APPROVAL agent-os cmd "/audit" # โ chain โ intact
Reusable SKILL.md procedures the agent matches and injects into your model's prompt. Compatible with the open Agent Skills format, so you can point at any skills folder and import it with no code โ and it runs on whatever model you've configured.
export AGENT_OS_SKILLS_PATH="/path/to/any/skills" agent-os cmd "/skills"
One transport-agnostic command set โ the same from the CLI, the web UI, and (later) chat.
| Command | What it does |
|---|---|
/ping ยท /status ยท /health | liveness, recent jobs, detailed health checks |
/learn <path|text> | ingest notes/files into the Brain |
/ask <question> | answer from your knowledge base (grounded + scored) |
/run <task> | read-only auto-runs; write/send/deploy is gated for approval |
/swarm <goal> | decompose โ parallel sub-jobs โ synthesize one deliverable |
/doctor ยท /model | recommend a local model ยท show the configured provider |
/cost | cost ยท latency ยท token usage rolled up across recent runs |
/risk <task> | show the risk classification for a task |
/pending ยท /approve <id> ยท /reject <id> | the approval queue for privileged actions |
/audit | recent audit entries + chain integrity |
/agents ยท /skills ยท /eval | profiles ยท skills ยท run the evaluation suite |
/job <id> ยท /trace <id> | inspect a persisted job and its trajectory + score |
/digest | synthesize a cross-episode insight digest |
CLI verbs: run, cmd, ui, doctor, swarm, skills, memory, health, supervise, daily-eval.
The non-negotiables that make agent-os different.
Diagrams + the full module map.
Ingestion, retrieval, grounding, hybrid search.
Providers, the doctor, opt-in wiring.
Decompose โ parallel โ synthesize, verified.
SKILL.md, the open standard, import.
The installer and the local web UI.
What's done and what's planned.
Threat model + known limitations.
How we review and keep quality up.