local-first Ollama-first ยท your model traced ยท scored ยท gated Apache-2.0 Python 3.11+

Your personal agent OS,
governed by default.

Click a button and agents come up and do the work โ€” research, triage, drafting, monitoring โ€” grounded in your own context, with every action traced, scored, risk-gated, and improved. Runs on your machine, on your model.

A non-technical person should be able to install it; a pro coder should be able to augment their work with it.

What agent-os is (and isn't)

It isn't the biggest pile of connectors or the flashiest chat UI. agent-os is the orchestration + evaluation + controlled-autonomy + personal-brain spine for your own agents โ€” the layer that makes automation trustworthy.

Trustworthy

Every action is accountable

Each run is recorded as a trace, scored by an evaluation gate, written to a tamper-evident audit log, and โ€” if weak โ€” turned into an improvement proposal.

Private

Local-first, your model

SQLite + the standard library at the core. Bring Ollama (free, local), OpenAI, or Claude with one env var. No bundled keys, no hidden network calls.

Safe

Default-deny autonomy

Read-only tasks auto-run; anything that writes, sends, or deploys โ€” or is ambiguous โ€” is gated for your approval. Nothing privileged slips through.

Anatomy of agent-os

One layered picture of the whole system. Your transports sit on top; your model plugs in at the bottom; in between, a governed harness turns every request into a traced, scored, risk-gated action. You build the unique top and bottom โ€” agent-os is the trustworthy middle.

You โ€” what you build
Your transports & workflows CLI ยท local Web UI ยท chat (WhatsApp/Slack adapters) ยท your domain tasks
The spine โ€” one governed entry point
Command Router transport-agnostic ยท audited on every call ยท returns plain text
The governed harness
Capabilities every action: traced โ†’ scored โ†’ risk-gated โ†’ improved

๐Ÿง  The Brain

  • Ingest notes & files
  • BM25 + semantic retrieval
  • Ground & cite answers

๐Ÿงฉ Skills & Profiles

  • Open SKILL.md import
  • researcher ยท operator ยท builder ยท qa
  • Prompt injection per task

๐Ÿ The Swarm

  • Decompose a goal
  • Bounded-parallel sub-jobs
  • Synthesize one deliverable

๐Ÿ›ก๏ธ Governance

  • Risk gate (default-deny)
  • Approvals queue
  • Hash-chained audit
  • Eval gate (Ninja Harness)
  • Hooks (secret redaction)
  • Cost & token metering
  • Persistent memory
Models โ€” bring your own (one env var)
Ollamalocal & free ยท no API key
OpenAIgpt-4o-mini & others
AnthropicClaude models
Echodeterministic ยท offline default

Read it top-to-bottom: a request enters through any transport, the Command Router authorizes and audits it, the harness runs it through the capabilities under the governance spine, and a model you chose does the thinking โ€” even a free local one. Swap any layer; the guarantees in the middle never move.

Quickstart โ€” one command

Creates a local virtualenv (no sudo, no global state), installs the evaluation gate + agent-os, and tells you exactly what to run next.

# install (or run ./install.sh from a clone)
curl -fsSL https://raw.githubusercontent.com/gagans23/agent-os/main/install.sh | bash

# which model can my machine run?
agent-os doctor
# โ†’ detects RAM / Apple-Silicon / NVIDIA VRAM + Ollama, recommends a model

# plug a local, free model (no API key)
export AGENT_OS_PROVIDER=ollama:llama3.1:8b

# open the local web UI โ€” click a button
agent-os ui      # http://127.0.0.1:8765 (auto-picks a free port if busy)

Prefer the terminal? agent-os cmd "/help" lists every command. Teach it something: agent-os cmd "/learn ~/notes.md" then agent-os cmd "/ask what did I learn?"

How it works โ€” the governed loop

Every command โ€” from the CLI, the web UI, or a chat transport โ€” flows through the same router and the same guarantees.

CommandCLI ยท UI ยท chat
โ†’
Router+ your context
โ†’
Risk gatedefault-deny
โ†’
Executeyour model
โ†’
Tracefull trajectory
โ†’
ScoreNinja Harness
โ†’
Improvepropose-only

Write / send / deploy tasks branch off the risk gate into an approval queue (/approve ยท /reject) โ€” they never auto-run. Every step is appended to a hash-chained audit log you can verify for tampering.

Built module by module

Never boiling the ocean. Each module is local-first and sits behind the same traced โ†’ scored โ†’ gated spine.

โœ… Module 0 โ€” Trust & Governance

Persistent jobs, memory, traces, agent profiles, skills; a default-deny + tool-aware risk classifier; an approval queue; a tamper-evident hash-chained audit log; the evaluation gate; supervisor / health / reliability.

โœ… Module 1 โ€” The Brain ๐Ÿง 

A local knowledge base your agents retrieve from. /learn your notes/files; /ask answers only from your context and is scored for grounding. Hybrid keyword + semantic search when an embedder is configured.

โœ… Module 2 โ€” Model onboarding ๐Ÿงฉ

One env var (AGENT_OS_PROVIDER) plugs in Ollama / OpenAI / Claude โ€” stdlib HTTP, no SDK. Powers answers, synthesis, and the Brain's embeddings. Plus agent-os doctor, a hardware-aware model advisor.

โœ… Module 3 โ€” Easy install + UI ๐Ÿ–ฅ๏ธ

A one-command installer and a minimal local web UI (agent-os ui) on the standard-library server โ€” nothing extra to install, localhost-only, driving the same governed router.

๐Ÿšง Module 4 โ€” Pro-coder + connectors

โœ… Compatible with the open SKILL.md standard (import any skills folder via AGENT_OS_SKILLS_PATH). Next: an MCP connector bridge, curated role packs, knowledge-graph import, and coding-agent links.

โœ… Module 6 โ€” The governed swarm ๐Ÿ

One goal โ†’ decompose โ†’ run sub-tasks in bounded parallel โ†’ synthesize one deliverable. Each sub-task is a real, traced, risk-gated, scored job. Honest concurrency, your model, your machine.

Module 5 โ€” watchers & dashboards (folder/event watchers, trend dashboards, a knowledge-graph view of the Brain) โ€” is next on the roadmap.

Feature deep-dive

๐Ÿง  The Brain โ€” your own context

Upload notes, files, or whole folders; agents retrieve from them and answer grounded in your material, not a model's guess. Answers are scored against the source, so ungrounded ones get flagged.

agent-os cmd "/learn To add fractions with the same denominator, add the numerators."
agent-os cmd "/ask how do I add fractions?"
โ†’ Based on your notes: ...                         [PASS ยท grounding 0.75]

๐Ÿ The governed swarm โ€” parallel, but verified

The decompose โ†’ parallel โ†’ synthesize pattern, placed under the trust spine. Privileged sub-tasks are gated, never auto-run; the synthesis is scored too.

agent-os swarm "research the top 5 local LLM runtimes; compare license, RAM, speed in a table"
๐Ÿ 3 sub-tasks ยท 2 done ยท 1 gated ยท 0 failed
   - [PASS 89] summarize ...        - [GATED:WRITE] delete the prod database
   Synthesis scored 88.8

๐Ÿฉบ doctor โ€” which model can my machine run?

Detects your hardware and recommends the largest local model that comfortably fits, with the exact one-liner to enable it.

agent-os doctor
Machine : Apple Silicon ยท 16 GB ยท Metal
โœ… Recommended: llama3.1:8b
   export AGENT_OS_PROVIDER=ollama:llama3.1:8b

๐Ÿ›ก๏ธ Trust & Governance โ€” tamper-evident by default

Every command is hash-chained into an audit log; any edit or deletion breaks the chain and is detectable. The risk classifier is default-deny and tool-aware. A global error boundary means you never see a raw stack trace.

agent-os cmd "/risk make the prod table empty"   # โ†’ WRITE โ†’ REQUIRES APPROVAL
agent-os cmd "/audit"                            # โ†’ chain โœ… intact

๐Ÿงฉ Skills โ€” incl. the open SKILL.md standard

Reusable SKILL.md procedures the agent matches and injects into your model's prompt. Compatible with the open Agent Skills format, so you can point at any skills folder and import it with no code โ€” and it runs on whatever model you've configured.

export AGENT_OS_SKILLS_PATH="/path/to/any/skills"
agent-os cmd "/skills"

Command surface

One transport-agnostic command set โ€” the same from the CLI, the web UI, and (later) chat.

CommandWhat it does
/ping ยท /status ยท /healthliveness, recent jobs, detailed health checks
/learn <path|text>ingest notes/files into the Brain
/ask <question>answer from your knowledge base (grounded + scored)
/run <task>read-only auto-runs; write/send/deploy is gated for approval
/swarm <goal>decompose โ†’ parallel sub-jobs โ†’ synthesize one deliverable
/doctor ยท /modelrecommend a local model ยท show the configured provider
/costcost ยท latency ยท token usage rolled up across recent runs
/risk <task>show the risk classification for a task
/pending ยท /approve <id> ยท /reject <id>the approval queue for privileged actions
/auditrecent audit entries + chain integrity
/agents ยท /skills ยท /evalprofiles ยท skills ยท run the evaluation suite
/job <id> ยท /trace <id>inspect a persisted job and its trajectory + score
/digestsynthesize a cross-episode insight digest

CLI verbs: run, cmd, ui, doctor, swarm, skills, memory, health, supervise, daily-eval.

Principles

The non-negotiables that make agent-os different.

Local-first, dependency-light. SQLite + standard library at the core; heavy infrastructure is optional and pluggable.
Pluggable, never faked. You bring the model and the transports; agent-os ships the structure, gating, and scoring. No bundled keys, no hidden calls, no stubbed integrations.
Default-deny autonomy. Read-only auto-runs; anything ambiguous or that writes/sends/deploys needs human approval.
Everything leaves a trace, a score, and an improvement. That's how the system compounds โ€” and how you can trust it.

Go deeper

Architecture

Diagrams + the full module map.

The Brain

Ingestion, retrieval, grounding, hybrid search.

Model onboarding

Providers, the doctor, opt-in wiring.

The governed swarm

Decompose โ†’ parallel โ†’ synthesize, verified.

Skills

SKILL.md, the open standard, import.

Install + UI

The installer and the local web UI.

Roadmap

What's done and what's planned.

Security

Threat model + known limitations.

Engineering

How we review and keep quality up.