Data streaming has been a major driving force in turning enterprise chaos to clarity. In the age of agentic AI, it will get only bigger.
According to Jay Kreps, Co-Founder and CEO of Confluent, as enterprises build out AI agents, they will need an underlying platform that eliminates data siloes and makes sense of unstructured data.
“With AI, you still need to bring all the data together, but now it has to happen continuously, in real time. You still need to process the data and ensure structure and quality, but it’s no longer about batch jobs; it’s real-time stream processing. That’s where data streaming platforms come in. This need has driven the growth of data streaming in recent years, and the rise of AI is accelerating that even further,” he remarked during the Confluent Current 2025 conference in Bangalore.
Practical use cases
To illustrate the intersection of data streaming and agentic AI, Kreps pointed to delivery companies and grocery stores that struggle to maintain accurate product catalogues.
“If you’re delivering groceries, you need to pull in information on products, stock, and inventory from every store across every location. Then you have to create a normalised view: What products are actually available, and where can you get them? To drive relevant advertising, you also need clear descriptions. You need to know which items are the same, and which ones are interchangeable. Unsurprisingly, the data is really messy,” he explained.
The actual workflow, he added, involves both software and people — some are pulling in data, others are tagging items and manually connecting them. This process requires considerable effort to keep it accurate and up to date, which ultimately delays the flow of information.
Rather than handling this manually, businesses can now deploy large language models (LLMs) to make sense of the data, Kreps said.
“Are these the same item? What’s the best way to describe this? Given five overlapping descriptions, which category does it belong in? How should I tag it? What would advertisers want to target? A language model can do all of that,” he said.
Ultimately, the goal is to connect these systems to other software and respond to business events as they happen.
“For a simple agent like this, it’s basically a microservice that uses LLMs in the background. It treats some of these incoming events like a sensory system to understand what’s happening. It uses that to build the context data it needs to make decisions, and then it acts — based on what it sees, what’s happening, the input, the product uploads, and the descriptions it processes. It’s taking action, and there’s a reason we build microservices with Kafka,” he said.
Kreps noted that this kind of event-driven microservice has several traits that make it effective across a wide range of domains, even without LLMs. It allows users to decouple components and link them together as needed, enabling modular deployments.
Operational challenges
Kreps noted that while many enterprises want to deploy AI, much of their data remains trapped in legacy applications that are deeply embedded in business operations. Rebuilding everything from scratch might appear to be a clean way to sidestep infrastructure issues, but from a cost perspective, it’s a big gamble.
“Too often, the message from new tech companies is: rebuild everything using our product and everything will be great. But that’s just not practical. We don’t think you need to rewrite all your applications. Instead, we want to connect to the data where it already lives, capture those real-time streams, and let new systems take advantage of that data. Modernisation should happen on its own schedule — where it makes sense and delivers ROI. It can’t be a forced march where everything changes at once just to gain the capabilities businesses need to stay competitive,” he explained.
To support compliance in highly regulated industries, Kreps outlined three steps Confluent is taking:
- The first involves making Confluent’s software fit customers’ needs. This meant creating an offering that can operate in customers’ data centres and across different cloud environments.
- The second is the integration of security and resiliency features to meet compliance demands.
- The third centres on structuring how data moves: how it flows, how it can be controlled, and how governance can be applied.
“Part of the value of data streaming is that it brings structure to how data moves across the organisation. That structure lets you enforce controls and prove what happened to valuable data, like PII or other critical business assets,” Kreps said.
He added that with a structured approach, organisations can more easily track where data ends up and how it is used. This is far more effective than relying on dozens of ad hoc methods, which often make it difficult to establish a clear audit trail.
Democratising technology
As AI use cases continue to grow, Kreps believes that every engineer working with data will eventually become an AI engineer.
“This is going to be a key part of the toolkit, just like databases are. Learning new programming languages, understanding basic deployment tools, working in the cloud — these are foundational skills we’ve all picked up to solve problems. Now, we’re at a point where the scope of what a software engineer can do is much broader. And what organisations can build with software is much broader too,” he observed.
While Confluent continues working towards making data streaming ubiquitous, Kreps acknowledged that it’s not enough to simply build the technology. The real goal is to empower everyone to take advantage of it.
To support this, the company focuses on two areas: simplifying the technology itself, and fostering a community of practitioners.
“Conferences like this help bring together the use cases, where people can learn from companies that have done this before — what they did, the challenges they faced, what worked, and what didn’t,” he concluded.