Key Takeaways
- Braintrust Data Inc. secured $80 million in Series B funding, led by ICONIQ Capital, with participation from Andreessen Horowitz, Greylock, Basecase Capital, and Elad Gil. The round values the company at $800 million, reflecting strong investor confidence in AI infrastructure platforms.
- As enterprises integrate AI agents and large language models into mission critical workflows, structured evaluation frameworks have become essential. Braintrust delivers infrastructure that measures model performance, identifies hallucinations, detects data drift, and flags regressions before they affect end users.
- The platform is already embedded within leading AI driven enterprises such as Notion, Replit, Cloudflare, Ramp, Dropbox, Vercel, Navan, and BILL. This adoption indicates increasing demand for continuous AI observability and production grade monitoring tools.
- The newly raised capital will be allocated toward expanding engineering capabilities, strengthening go to market operations, establishing additional office locations, launching enhanced observability features, and entering new geographic markets.
Quick Recap
San Francisco-based Braintrust Data Inc. has officially announced the close of an $80 million Series B funding round, led by ICONIQ Capital at an $800 million post-money valuation. The round included returning backers Andreessen Horowitz, Greylock, Elad Gil, and Basecase Capital. The announcement was made via the company’s official X (formerly Twitter) account, with CEO Ankur Goyal signaling that Braintrust is “building the infrastructure that helps teams measure, evaluate, and improve their AI products”.
Inside Braintrust’s AI Observability Platform
Braintrust has built an AI-native observability and evaluation platform designed specifically for monitoring the quality of AI models and their outputs in production a fundamentally different challenge than traditional system-health monitoring.
The platform integrates several critical workflows:
- Exhaustive Tracing: Automatically captures every step of an AI model or agent’s reasoning process, including prompts, tool calls, retrieved context, and metadata on latency and cost.
- Automated Evaluation: Uses built-in scorers and an LLM-as-a-judge approach to evaluate model outputs for accuracy, relevance, and safety. Teams can run both offline experiments during development and online scoring on live production traffic.
- Prompt Playground: A visual interface to test and version-control prompt changes against real production data before deployment.
- AI-Powered Assistant: Analyzes millions of traces to suggest better prompts, create new datasets, and identify patterns that cause specific hallucination types.
Critically, all of this runs on Brainstore, Braintrust’s purpose-built database, which is reportedly 80% faster at querying complex AI traces than alternatives. This performance advantage is essential as enterprise AI deployments scale to millions of daily interactions.
What Leadership Is Saying?
Matt Jacobson of ICONIQ noted that companies with enduring impact typically demonstrate strong and consistent customer focus. He stated that Ankur and the Braintrust team have embedded this principle into their product strategy from the outset, aligning development closely with evolving user requirements.
Competitive Landscape
The competitive intensity in AI observability is increasing at a measured but decisive pace. In February 2025, Arize AI secured $70 million in Series C funding to expand its large language model evaluation and monitoring capabilities. The round was positioned as one of the largest investments in the AI observability segment, reflecting growing enterprise demand for structured performance tracking and risk management across AI systems.
At the same time, Langfuse, widely adopted within the developer community, was acquired by ClickHouse in January 2026 as part of a $400 million Series D financing at a $15 billion valuation. The transaction highlights how observability is moving beyond a developer focused capability and becoming a core component of enterprise grade AI infrastructure, supporting governance, reliability, and scalable deployment.
| Feature / Metric | Braintrust | Arize AI | Langfuse (ClickHouse) |
| Latest Funding | $80M Series B (Feb 2026) | $70M Series C (Feb 2025) | Acquired by ClickHouse (Jan 2026) |
| Total Raised | ~$80M+ (Series B) | $131M across 4 rounds | $4.5M pre-acquisition |
| Valuation | $800M | Not publicly disclosed | Part of ClickHouse ($15B) |
| Open-Source Component | No | Yes (Phoenix – 2M+ monthly downloads) | Yes (MIT license, 20K+ GitHub stars) |
| Offline Evaluation | Full experiment workflows with Eval(), CLI, and UI | End-to-end evaluation workflows | Basic evaluation |
| Online Production Scoring | Automated scoring rules on live traces | Real-time monitoring and alerting | Prompt management + cost monitoring |
| AI-Powered Assistance | Built-in AI assistant for prompt suggestions and pattern detection | Limited | Limited |
| Pricing (Entry Tier) | Free tier; Pro at $249/month (unlimited users) | Enterprise-focused; $3/GB additional storage | Open-source self-host; managed pricing via ClickHouse |
| Notable Customers | Notion, Cloudflare, Replit, Ramp, Dropbox | Uber, Klaviyo, Tripadvisor | ClickHouse ecosystem |
| Key Differentiator | Brainstore database (80% faster trace queries); playground-driven UX | Deepest enterprise ML observability heritage | Open-source community + ClickHouse analytics engine |
Strategic Analysis
Braintrust leads in developer experience and UI-driven evaluation workflows, making it the strongest choice for product and engineering teams that want an integrated, non-code-heavy approach to AI observability.
Arize AI, with $131M in total funding and deep roots in traditional ML observability, holds the edge for large enterprises with complex, multi-model production environments. Langfuse, now backed by ClickHouse’s $15 billion infrastructure, offers the most compelling option for teams that prioritize open-source flexibility and self-hosting.
Bayelsa Watch’s Takeaway
I think this is a big deal, $800 million valuation at the Series B stage for an observability focused company indicates a structural shift in the AI ecosystem. The industry is moving beyond rapid model deployment toward ensuring models operate reliably, consistently, and within defined performance standards.
Across the AI infrastructure landscape, capital is increasingly being allocated to accountability rather than experimentation. While many startups previously raised funding based on model capability claims, Braintrust’s positioning centers on evaluation, transparency, and measurable outcomes.
