Every collective noun for animals captures something true about how they move and behave together. A murder of crows isn't just a group — it names the coordinated, intelligent behavior that makes them formidable as a collective. We chose maturity deliberately.
Armature's agents don't just coordinate to complete a task. After every run, the system collects execution traces, runs the DiagnosticAnalyzer against them, and uses the SpecRefiner to rewrite the YAML sections that underperformed. The next run is better. Your workflow doesn't just run. It matures.
“A murder of crows is more dangerous than one.
A maturity of agents is smarter than before.”
Armature isn't a run-once tool. It's a loop. Each step feeds the next, and the next run is smarter than the last.
Roles aren't a label — they determine execution order, context access, and contribution to the self-improvement health score. A well-designed maturity uses all four.
The information foundation. Researchers query tools, read context, search external sources, and build the knowledge base that downstream agents draw from. They run first — and in parallel when independent.
The production engine. Workers synthesize research into drafts, summaries, reports, code, or structured data. They consume upstream researcher output and produce the artifacts that judges and downstream workers will evaluate.
The quality gate. Judges score output quality, validate against criteria, flag hallucinations, and decide whether a result meets the bar. Only judges contribute to the quorum score in the IHR — they are the accountability layer.
The control plane. Orchestrators manage multi-stage execution pipelines, route work to specialized subteams, handle branching and conditional logic, and ensure all dependencies are satisfied before downstream stages run.
AWS AgentCore, LangGraph, and CrewAI let you build agent workflows. Armature does that too — and then automatically improves them across runs using the Implicit Harness Rating loop.
Prediction-verification closes the loop: SpecRefiner declares what it expects each rewrite to fix. The subsequent run confirms whether the fixes held — and which ones missed. The rewriter improves its own judgment over time.
Add --auto-improve to any run. When IHR drops below 0.75, Armature automatically calls SpecRefiner after execution — rewriting prompts, relaxing schemas, rebalancing model tiers, or tuning retry limits. Safe changes apply immediately; structural rewrites stage to {spec}.pending.yaml for human review.
v0.2.0 adds mission anchoring, cross-run continuation, triggers, streaming responses, and governance — making Armature workflows production-grade for services, scheduled jobs, and interactive systems.
Anchors all LLM stages to a stated goal. Prevents drift in long workflows — the system automatically validates that each stage output moves toward the mission.
Cross-run context carry-forward. Start where you left off. Enables incremental research, weekly briefings, or ongoing analysis without resetting state.
Cron and webhook daemon. Schedule workflows (6am daily), react to events, or expose HTTP endpoints that kick off runs. Turn any workflow into a service.
SSE token streaming. Converts any workflow into a streaming API. Perfect for chatbots, real-time UIs, and interactive agents.
Replay any run with cached LLM responses. Debug via transcript, understand exactly why a result was produced, or rerun with different safety rules.
armature validate now scores LOW/MEDIUM/HIGH/CRITICAL before any run. Catch dangerous specs early with Governance layers, tool bans, and safety drift detection.
Armature isn't invented from first principles — it's a synthesis of the best current academic thinking on agent harness design, published between February and May 2026, plus Microsoft's Agent Governance Toolkit, ActiveGraph's event-sourced execution model, and Veldt Labs' KYA trust layer. Every source contributed concrete, implemented capabilities.
Mature has two meanings here. The agents grow smarter every run — and the harness itself matures alongside the field, tracking the latest research as it ships.
The core finding shared across all seven: the harness is more important than the model. Armature ships the harness — production-grade, self-improving, and open source.
Write a YAML spec, point Armature at it, and watch your maturity of agents get to work.
Open-source projects that showcase what Armature can do — security scanning, automated research, and more.
Armature is free, MIT licensed, and built in the open. Fork it, extend it, build on it. Contributions welcome — especially new role types, tool integrations, and self-improvement strategies.
Armature is an open-source project from ElfTech — we build autonomous AI systems for operational work. Explore our full platform at elftech.com.