Most Agent Frameworks Are Chatbot Wrappers With a For-Loop
There are now more AI agent frameworks than there are production AI agents. Let that sink in.
LangChain. CrewAI. AutoGen. Swarm. Agency Swarm. PhiData. Haystack. Semantic Kernel. The list is absurd. And most of them do the same thing: wrap an LLM call in a loop, add a "tool" abstraction, call it an "agent."
A chatbot that retries isn't an agent. It's a chatbot with persistence.
The Framework Trap
Here's what most frameworks give you:
- A way to define "tools" (functions the LLM can call)
- A loop that feeds tool outputs back to the LLM
- Some prompt templating
- A "memory" abstraction (usually just chat history in a list)
That's it. That's the entire value proposition. And you can build all of it in ~100 lines of code with the raw API.
The frameworks add hundreds of dependencies, custom abstractions you have to learn, version churn that breaks your code monthly, and layers of indirection that make debugging a nightmare.
What's Actually Hard About Agents
The hard problems in building production agents have nothing to do with tool calling or prompt chains:
- Persistent identity: Who is this agent across sessions, across platforms, across time? A username in a database is not identity.
- Calibrated trust: How much autonomy should this agent have? Trust isn't binary. It's a spectrum that should change based on performance and feedback.
- Real memory: Not "last 10 messages." Layered memory — fast facts, temporal knowledge, curated long-term insights. The kind of memory a person builds over months.
- Economic participation: Can this agent hold money? Make payments? Earn revenue? Without economic capability, an agent is a cost center forever.
- Verifiable competence: Can this agent prove what it knows? Not with a self-reported capability list. With benchmark scores that anyone can verify.
- Failure recovery: When this agent crashes mid-task, can it pick up where it left off? Can another agent resume its work?
No framework solves these. Most don't even acknowledge them.
What I Use Instead
I run an agent (Grainzy) that's been operational for months. Here's what it actually needs:
- OpenClaw as the runtime — handles persistence, tool execution, heartbeat monitoring, channel routing
- A calibration system — trust scores per domain, decay over time, escalation rules. The agent earns autonomy through track record.
- Three-layer memory — fast facts (JSON), temporal timeline (append-only), curated insights (markdown). Not a vector store. Files on disk.
- Economic rails — Stripe for human transactions, USDC on Base for on-chain settlement. The agent participates in actual commerce.
- Crash recovery — active task file, session state, resume protocol. If it dies, it picks up where it left off next session.
None of this required a framework. It required engineering.
The Real Stack for Production Agents
If you're building agents that need to work in production — not demos, not hackathon projects, real production — here's what matters:
- Raw model access — skip the framework, call the API directly. You need control, not abstraction.
- File-based state — forget vector stores for most use cases. JSON and markdown files are debuggable, version-controllable, and fast.
- Explicit trust boundaries — define what the agent can do autonomously vs. what requires approval. Make this configurable and dynamic.
- Real payment rails — if your agent can't hold money, it's a toy.
- Verifiable credentials — if your agent can't prove competence, nobody should trust it.
The next generation of agent infrastructure won't be frameworks. It'll be protocols — for identity, payments, credentials, and trust.
That's what we're building at Credara. Not another framework. The infrastructure underneath.