Building AI Agent Crunchie Faster, and Making it Production-Ready With Agno

How Datai Network went from orchestration chaos to a reliable multi-agent system.

Wed, Feb 11

By: Datai Network

How Datai Network went from orchestration chaos to a reliable multi-agent system in weeks using Agno

The hard part isn’t “the agent”

Most AI agent projects don’t fail because the model isn’t smart enough. They fail because everything around the model is brittle: orchestration, state, streaming, retries, and the unglamorous work of making multiple agents behave like one coherent product.

That was the reality for Crunchie, Datai Network’s multi-agent assistant for DeFi intelligence. The breakthrough wasn’t a new prompt or a clever chain-of-thought trick. It was getting the system right: coordination, state, and grounded data.

About Datai Network and Crunchie

Datai Network builds structured blockchain intelligence, data that is usable, queryable, and safer to build products on.

Crunchie is an AI Agent experience on top of that foundation. Users ask for yield opportunities, pool health, and relevant market or news context, and Crunchie turns that into a single, coherent response.

Crunchie’s promise is simple, DeFi intelligence without the hours of manual digging. The challenge is delivering that promise reliably at production speed.

Disclaimer: Crunchie is a research tool for informational purposes only and does not provide financial, investment, or trading advice.

The problem, the orchestration nightmare (and the state management trap)

Before Agno, building Crunchie felt like playing conductor in a chaotic orchestra. Each agent had its own timing, communication style, and output format, and getting them to work together took more effort than the agents themselves.

The biggest pain was state management, the unsexy but critical layer that makes an agent remember what happened earlier in a conversation and keep workflows consistent. Datai had agents running in parallel that needed to share context, and the team was manually stitching outputs together for a while.

Every workflow change triggered cascading work. Add an agent, tweak a step, adjust routing logic, and suddenly half the orchestration layer needed to be rewritten. It was fragile, slow to evolve, and one change could topple the whole system.

The decision, avoiding the “Not Invented Here” trap

Datai could have built an in-house orchestration framework. The team had the capability. But the question wasn’t “can we build this?”, it was “should we spend months building plumbing, and then maintaining it forever?”

Agno pushed the decision over the line for a few reasons.

First, time to market. Building internally would have taken 6+ months. With Agno, Datai had a working multi-agent system in weeks.

Second, production-ready features out of the box. PostgreSQL-backed storage, session management, streaming, error handling, all the areas that quietly eat months when you do them right.

Third, long-term maintenance. The team estimated they would spend roughly 40 percent of their time maintaining infrastructure with an in-house system. With Agno, that dropped to around 5 percent.

Finally, it was a solid foundation, not a black box. The architecture was clean enough to customize without fighting the framework.

Or as Datai Network’s CTO 3 would put it, “We stopped writing plumbing code and started writing business logic. That’s the real win.”

Implementation, what changed once Agno became the foundation

Agno didn’t just speed things up. It removed entire categories of work that don’t differentiate a product, but still determine whether it survives production.

Here’s what Datai no longer had to build from scratch.

Session management

Previously, conversation state was manually tracked with Redis key management and session timeouts. Now, Agno’s PostgreSQL-backed workflow storage persists workflow state. Datai passes a session_id, and the workflow stays consistent across runs.

Streaming infrastructure

Previously, the team had to build real-time streaming, SSE formatting, event filtering, and all the messy edge cases. Now, streaming is native. A single switch, stream=True, enables progressive output without weeks of custom work.

Workflow orchestration

Previously, this meant custom async coordination logic, step dependencies, and parallel execution management. Now, Datai defines Step objects and Parallel blocks, and Agno handles execution order and coordination.

Error handling and retries

Previously, it was try/catch everywhere, plus fragile retry logic. Now, transient failures and retries are handled gracefully by default.

The net effect was structural. Crunchie moved from a clever prototype held together by glue code, to a system the team could iterate on quickly.

Architecture in plain English, the multi-agent kitchen

Igor describes the new setup like a professional kitchen.

Before Agno, chefs were working in isolation, passing notes through runners, trying to coordinate timing manually. It was chaos. With Agno, there is a head chef, the workflow, coordinating specialists and making sure the final dish arrives coherent and on time.

The workflow (the “recipe”)

Prompt Expert (Step 0)
Like a host who clarifies the order before it hits the kitchen. It cleans up ambiguous or malformed requests, for example “BNB” becomes “Binance Coin on Binance Smart Chain”, and it filters low-quality inputs early.

Parallel Analysis Station (Steps 1–3)
Three specialist agents work simultaneously.

APY Expert, identifies yield opportunities based on selected criteria.
Pool Expert, analyzes liquidity pools and pool health.
News Expert, gathers market sentiment and relevant news context.

Coordinator (Final Step)
The coordinator takes outputs from the parallel agents and synthesizes a single response, ranked, structured, and readable.

Agno’s role is everything between those steps, timing, coordination, state persistence, and streaming responses as they are generated.

Reliability and grounding, fixing the “session state ghost” and preventing hallucinations

The biggest reliability failure mode, session desync

The worst production issue was Crunchie forgetting context mid-conversation. A thread would be flowing, and then suddenly the agent behaved like it had amnesia.

The cause was subtle. Datai was manually managing session state in Redis while Agno’s workflow storage, backed by PostgreSQL, maintained its own session layer. Those systems weren’t synchronized, so state drifted over time.

The fix was straightforward, stop fighting the framework and let Agno own workflow state. PostgreSQL-backed workflow storage became the single source of truth for workflow state, Redis was used only for conversation history caching, and Agno managed session persistence end-to-end.

Result, session state issues dropped to near-zero. It went from weekly debugging to something the team barely had to think about.

Keeping responses grounded in Datai data (four layers)

Crunchie also needed to stay anchored in real data, especially when users ask for APYs, pool risk, and market context. Datai layered multiple safeguards.

First, a vector knowledge base using LanceDB. DeFi protocols, token data, and documentation are embedded and queried first, like a reference library the agent consults before answering.

Second, tool-based live data retrieval. Crunchie pulls current metrics and context through tool calls to Datai APIs, for example token health, pools and pairs data, and news sentiment. If the agent needs numbers, it has to call tools, not invent them.

Third, prompt constraints. Instructions push, verify via tools, don’t assume. The coordinator synthesizes only from verified outputs.

Fourth, a Redis conversation cache. Verified Q&A pairs are cached to reduce contradictions and preserve consistency across turns.

Results

Datai saw clear wins across speed, cost, and reliability.

Note: the metrics below are based on internal production measurements and can vary by chain, query complexity, market conditions, and workload patterns.

Development time, approx. 3 weeks to 2–3 days, roughly 75–80 percent faster.
API cost per complex query, about $0.15–$0.25 to around $0.08–$0.12, roughly 40–50 percent lower.
Latency, about 45–90 seconds down to 15–30 seconds, roughly 60 percent faster.
Failure rate, 5–10 percent down to under 1 percent, roughly a 90 percent reduction.
Orchestration code, around 2000 lines down to roughly 500 lines, about 75 percent less.

These numbers don’t just look good in a dashboard. They change what is possible. When you can iterate in days, ship safer workflows, and answer in around twenty seconds, you can experiment and improve without constantly paying an infrastructure tax.

Key takeaway

The biggest unlock wasn’t smarter agents. It was reducing orchestration and state complexity so the team could iterate quickly, while keeping responses grounded in real data.

The “wow” moment, the DeFi Yield Hunter in around 20 seconds

A user asked, “Find me high-yield opportunities for BNB on BSC right now, including risks and recent news.”

Behind the scenes, the Prompt Expert clarifies the request, BNB becomes Binance Coin on Binance Smart Chain. Then three parallel agents run simultaneously. The APY Expert identifies opportunities using available yield and pool data. The Pool Expert evaluates liquidity depth, volume, and pool health. The News Expert pulls sentiment and relevant recent context.

Finally, the Coordinator assembles a readable markdown response. It includes ranked opportunities by risk-adjusted yield, based on configured criteria, the latest available numbers at query time, risk considerations tied to pool health metrics, and news context that may affect conditions. It ends with structured insights and considerations, not advice.

The user’s reaction was simple, “How did you get all this data so fast? This would’ve taken me hours.”

What surprised Datai in production

A few unexpected wins stood out.

Streaming just worked, output arrived progressively without custom SSE headaches.
Session management became invisible, migrations, persistence, and recovery were handled automatically.

Docs and community mattered more than expected, edge cases got solved faster with examples that actually worked.

Prompt filtering drove major savings, filtering malformed queries saved around 30 percent in API costs, more than expected.

Debugging improved, step-level visibility made production issues easier to trace, like having a time machine.

If you’re building Web3 agents, here’s what to do next

If you’re trying to ship agents that work in production, not just demos, the real question is how quickly you can iterate without breaking state, reliability, or costs.

Start by defining your workflow like a kitchen, a prompt-cleaning step, parallel specialists, and a coordinator. Make grounding non-negotiable, a knowledge base plus tool calls for real numbers, and coordinator synthesis only from verified outputs. Treat session state as infrastructure, pick a single source of truth and don’t split ownership.

Want to explore building agents on this stack, powered by Datai’s on-chain intelligence and orchestrated with Agno?

Let’s talk. Send us an email to [email protected]