Resistance is futile: AIOps is the future

AIOps tech promises to help IT ops and development teams cope with the growing dynamic complexity of enterprise software maintenance, yet enterprises have been resisting that allure. Here’s why they can’t (and won’t) hold off much longer.

Photo by Arseny Togulev

Enterprises have been hesitant to embrace artificial intelligence and machine learning (AI/ ML) to beef up their IT operations management, but analyst firm 451 Research predicts that more enterprises will be turning to AIOps in the next couple of years – if only because they won’t have much choice.

Many enterprises across all sectors are becoming more software-centric to play in the digital era and keep up with both consumer demand and the competition. That’s fueled the rise of IT technologies like containers, Kubernetes and microservices (to name a few), as well as DevOps culture that enables developers to push out new products and features more quickly.

However, this software-centric shift is also creating far more complex and dynamic IT environments, which is putting pressure on IT ops teams to manage it all. According to a recent survey from 451 Research, 75% of respondents agreed to some level that IT operations and maintenance is taking up too much time.

“To hammer this home, we asked developers how they were spending their time,” explained 451 Research analyst Nancy Gohring during a recent webinar on AIOps, “and 55% said they spent the bulk of their time on maintenance and management of custom apps supporting business processes, as opposed to developing new apps and features to differentiate from the competition and respond to consumer demand.”

During the same webinar, Will Cappelli, EMEA CTO at Moogsoft, observed that IT systems are getting harder to manage because they’ve become more modular, dynamic and volatile over the last ten years. “It’s so complex that it’s hard for the human mind to fully comprehend what’s going on, let alone diagnose the root cause of performance problems or anticipate outages or incidents.”

Naturally, that’s where AI and ML comes in. According to Capelli, just about every aspect of IT ops management – from monitoring tools to service desks, CMDBs, runbook automation and smart alerts – can be enhanced by AI/ML, which will help the development and ops teams cope with that complexity and get more value out of those tools. Indeed, the main attraction of AIOps is its ability to automate processes, help enterprises gain more useful insights from big data, and generally make IT ops more efficient with fewer maintenance tickets, faster responses to problems and faster remediation.

Moreover, Cappelli of Moogsoft envisions an IT ops architecture in which AI eventually becoming the central “brain” linking disparate areas of IT ops together.

“From an architectural standpoint, you want to be enhancing your existing monitoring tools, log management tools, timeseries database tools, event management tools and so on with some kind of AL/ML capability,” he explains, “but you also want to leave room in your architecture for that coordinating layer where AI is going to effectively sit at the heart of it and act as a central, smart coordinating switch that takes you across the different areas of IT ops management.”

AIOps is inevitable

AIOps sounds like a no-brainer when you put it like that, but of course it’s not that simple. According to Gohring, while enterprises want to believe in the promise of AIOps, they’re still suspicious of the market hype.

It hasn’t helped that early versions of AI/ML-enabled monitoring and maintenance tools weren’t that user-friendly, or lacked critical features like root cause analysis, collective intelligence, correlation of anomalies and past-incident analysis.

Now that AIOps tools are emerging that support these essential functions, 451 Research is predicting an upswing in adoption of AIOps this year and next as the inherent advantages become apparent. However, that upswing will be driven not just by the availability of more sophisticated tools, but by the fact that software-driven enterprises in particular will have their hand forced by the realities of IT complexity.

“If you have a sophisticated cloud native app using containers and Kubernetes, it’s complex and dynamic, and if you don’t upgrade your monitoring system from something that was out in the market ten years ago, the risk is you’ll have terrible performance and an unreliable system and it’s going to take you weeks to solve performance problems,” she says.

She adds that as IT ops teams start trying out AI/ML-powered tools and realize the benefits, they’ll get more comfortable with them and apply them to solutions like auto remediation. “Once people continue to use tools that are using ML to identify root cause, and see these tools are doing that accurately, they’ll embrace auto remediation. That will happen relatively soon, in the next year or two.”

Team friction

However, Gohring continues, don’t expect a smooth ride. One challenge facing IT ops teams is that up to now, DevOps teams have typically been given the autonomy to choose their own monitoring tools, which can result in too many different tools across different teams. This is especially a problem for enterprises with many DevOps teams organized around interlinked microservices.

“If you’re all using different tools, and you have zero visibility into another microservice that your microservice might depend on, you’re going to have a lot of problems when performance issues pop up,” she explains.

Many organizations have tackled this problem by forming a central team that takes over some functions such as choosing and managing monitoring tools for the DevOps teams. This puts everyone on the same page, improves visibility across microservices and allows more of a centralized view that can be helpful when performance problems pop up.

The snag is that DevOps team members can get precious about the tools they want to use and don’t like being told to use something else, Gohring says. “Offloading [tools] to a centralized team you can collaborate with makes sense once it happens, but that hurdle of giving up your autonomy will continue to be a challenge.”

Cappelli of Moogsoft agrees that DevOps teams will need more convincing than IT ops teams that AIOps can add value to what they do. “But I do believe that over the next five years, the DevOps world will come to see that an observability-centric approach to IT management – a focus on metrics, logs and traces, and the analysis of these data types in real time – will indeed require AI/ML technologies in particular because those data sets are incredibly complex, dynamic, volatile and noisy. If they want to deliver effective observability, they’re going to need to call on the resources of AI technologies.”

Meanwhile, another challenge to AIOps adoption is that it has a tendency to disrupt existing job descriptions and associated skillsets.

Gohring cites one mid-level management employee who said that AIOps actually resulted in the company’s ops team becoming less effective because most of the problems were software defects, which the ops team couldn’t resolve because they didn’t write code. In this case, they had to be retrained from serving as “alert babysitters” to becoming analysts who could respond to the anomalies that AI discovers.

The skillsets challenge is exacerbated by the fact that – at the moment – there are no AIOps training and certification programs employees can attend to upgrade their skills.

“I think enterprises just have to get creative – give your people the leeway to network with their peers internally and externally to pick up the skills and expertise they need,” Gohring says.