AI’s secret weapon: Backup data

Like an iceberg, most data value lies beneath the surface. Image courtesy of Unsplash.

AI is here to stay, and yet enterprises often rush to embrace the latest trend in technology. More often than not, ideas fail to move beyond the proof-of-concept stage because the data is either scattered, stored in the wrong format, or located in the wrong place.

For data protection and management provider Cohesity, fixing the data problem is not optional, it is essential to fully leverage AI’s potential. During a media briefing, Chief AI Officer Craig Martell and Global Field CISO Joye Purser discussed the power of backup data in unlocking new business opportunities.

Business proposition

Craig Martell, Chief AI Officer, Cohesity. Image courtesy of Cohesity.

Martell, who has previously worked at Lyft, Dropbox, and the United States Department of Defense, joined Cohesity in 2024. According to him, one key reason convinced him to make the jump.

“I’ve had a strong relationship with the company for about 10 or 11 years. In particular, I was super impressed with the founder, Mohit Aron’s vision of being able to do analytics on top of secondary data,” he recalled.

Martell demonstrated the concept by using secondary data to simulate a future recession: “If you use data from the last two or three years, there’s nothing in there about being in a recession. The beauty of secondary or backup data is that it spans your organisation’s entire history. You could go back to 2007 or 2008, which is far more relevant for modelling a potential recession. That ability — to treat your company’s data as a resource for AI, business analytics, or machine learning — is really what motivated me.”

Zeroing in on the current enterprise AI landscape, Martell said organisations should not rush into the hype, as doing so could lead to project failure.

“Don’t worry about AI, there will always be an AI vendor. Your fundamental job is to get your data right,” he noted.

Data access

The problem with having data spread across multiple systems, Martell stressed, is the lack of a single source of truth.

“I like to call it a single plane of data. The job of the chief data officer is to identify the organisation’s important data, gather it, and make it accessible or extract value from it. As it turns out, the CIO has already been doing that. Backing up data essentially addresses many of these issues, though not all. But what is backing up? What do you need to do when you back up?” Martell said.

Since funding for data resilience is typically easier to obtain than funding for data modelling, Martell argued that money is not the main obstacle.

“You’ve secured the funding, and you’ve already decided which business data is important, because that’s what you’re backing up. You’ve gone through the trouble of building the IT infrastructure, implementing connectors to all the apps that contain that data, and centralising it in a backup repository. Traditionally, that backup was intended for emergencies and stored in a way that made it difficult to access. You’d have to rehydrate the data to get it back,” he explained.

In Cohesity’s design, the backup infrastructure is built as a file system, where every file is directly accessible, unlike legacy backup systems, Martell said.

“Traditional backup is based on snapshots and deltas: delta, delta, delta, then a big snapshot, and so on. If you wanted version 1.7, you’d have to start from the beginning, find the seventh version, merge all the changes, and then rebuild the file. That process comes with very high latency,” he observed.

To illustrate Cohesity’s approach, Martell used a tree to represent the file system.

“These nodes contain information about the file. That’s the first version. The second version contains only the changes, and points to where those changes belong in the original file. The third version contains changes since then, and also points to their corresponding locations. Instead of having to merge each of these deltas manually, the system merges them as it builds the file system. If you want version 1.0, you start at the blue and follow the correct path. If you want version 1.2, you start at the red. For version 1.3, you start at the green. That means every version is actually a file,” he said.

Security concerns

After solving the data access problem, the next priority is security — an equally important paradigm that cannot be treated as an afterthought. For Martell, data security, particularly in the context of AI, should not fall solely on the CISO’s shoulders.

Joye Purser, Global Field CISO, Cohesity. Image courtesy of Cohesity.

“Anybody can be a victim of phishing, and we need to think about AI in the same kind of whole-of-organisation way. There’s a lot the organisation needs to understand, starting with the fact that AI models are inherently statistical, which means they’re guaranteed to be wrong sometimes. How is product management designing this product, knowing it’s guaranteed to be sometimes wrong? How does the design team build it so customers understand that it may get things wrong? And what’s the escape hatch for the customer when that happens?”

For Purser, the manufacturing industry is a clear example of how data security gaps can cripple operations. She noted that while human safety is the primary risk in any business, the next major concern in manufacturing is operational technology (OT) security.

“Let’s say Toyota encounters a business stoppage, a manufacturing stop — then you’re losing billions of dollars a minute. There are a variety of risks when OT breaks,” she said.

Purser stressed that OT security is often more outdated than other software systems, giving threat actors an opening into the company’s broader ecosystem.

“That can cause lapses in the machines behaving the way they need to,” she added.

In the global context, she observed that the nature of security threats often depends on geographic location.

“I find that who is attacking you depends on where you are. For example, when I visited Europe, they were very concerned about Russian cyberattacks because the physical warfare was in close proximity to them. Hence, they’re feeling very threatened by Russia. Of course, the US has a near-peer dynamic with China, and also with Russia. And depending on where you are in Southeast Asia, that dynamic changes,” she remarked.

Purser went on to differentiate between ransomware-as-a-service groups, state-sponsored attackers, and cyber gangs.

“For ransomware as a service, they run it like a business with specialists. There’s a specialist to breach the system, another to move laterally within the network, and a third to identify the password store area. Nation states are trained differently. They may also have specialists, but they’re not motivated by money. And like any business of thieves, the cyber gangs probably have a good amount of fraud and money laundering. So you’re going to see different behaviours, and that’s reflected in the attack signature,” she explained.

Business thrust

At present, Martell sees Cohesity’s merger with Veritas as a major advantage in the data protection and management space. “The value of integrating with us is we’ve already connected to all of those apps out there. We’ve connected to apps they would never have thought about connecting to. We integrated apps through the Veritas acquisition that are 30 years old, because some enterprises are still using them.”

Asked whether he considers Cohesity to be in direct competition with companies like Elastic or Splunk, Martell disagreed.

“I could actually see Elastic being part of our solution, because the key is that we have all that data, and we need to retrieve it for whatever app is necessary. Elastic may easily be part of that solution. How we retrieve the data for them might vary, depending on the company. If a company is an Elastic shop, then we’d integrate nicely with Elastic. If they’re using something different, like Google Cloud and Google’s indexing solutions, then we could integrate with that instead,” he said.

In conclusion, Martell offered advice to enterprises exploring AI while wrestling with ethical concerns such as bias.

“Sometimes you gather the wrong data from the past — even if the future resembles the past — and it doesn’t predict the future correctly because the data was gathered poorly. The algorithms are never biased. It’s always the data used to train the algorithm that’s biased. How do we gather data in a way that avoids building a biased model? And if we realise we’ve built one, how do we fix it? These aren’t just problems for machine learning engineers, they’re problems for the entire corporation,” he said.

Editor’s note: The media briefing featured above took place in late April. Craig Martell will be joining a new company by July 2025.

- Advertisement -