AIOps – the next step in the cloud computing revolution

When it comes to digital transformation, every organisation, large or small, has a plan in mind. However, if the past two years have taught us anything, it is that even the best-laid plans run the risk of being sidetracked. With the global crisis pushing remote work into the mainstream, huge amounts of data and applications have had to migrate to the cloud, sometimes way ahead of schedule, too. Applications, in particular, are distributed across a wider network adding extra layers of complexity to the already complex work by CloudOps teams. 

The rapid upscaling of CloudOps has also exposed some of its shortcomings. While CloudOps teams are no stranger to issues like storage failures, data corruption and network outages, they are often fixed only after the fact. Considering the complexities at hand, being reactionary is no longer viable in the world of cloud computing. 

A more proactive approach, then, is required — so proactive, in fact, that issues may very well be detected and addressed before they even occur. This is not science fiction, but a piece of technology that already exists: Artificial Intelligence (AI).

Enter AIOps

Applying AI to IT operations, or AIOps, is the natural, inevitable next step in CloudOps’ evolution. And this approach has many benefits. First, it ensures better visibility. Considering the speed, volume and variety of digital data that we are amassing every day, it is simply beyond what conventional human-led CloudOps can do, or at least effectively. AIOps, on the other hand, thrives on data. The sheer volume of information available about IT environments pose no threat to machine learning-backed AIOps because, the more data one feeds the system, the better it understands what the environment looks like. This benefits the end-user also, as data centre managers are now that much closer to a single, unified dashboard that shows only the most critical alerts and information. 

Secondly, AIOps improves risks detection. Understanding what the entire IT environment looks like also means that AIOps is uniquely positioned to identify tell-tale signs of issues even before they arise. For example, if a server on the network is running out of disk space, an AIOps tool could allocate more storage automatically — in real time — all without any human intervention. More sophisticated versions of AIOps can even provide predictive information, which means that smaller issues can be addressed before they become serious, more catastrophic problems sometime down the line. 

Speaking of outages, they happen no matter how simple the infrastructure is. The key is to reduce the amount of downtime to the minimum — and this is where AIOps truly shines. That is because machine learning is better at digging through an immense quantity of data in order to highlight the source of a problem. That way, administrators no longer have to go on a wide troubleshooting hunt.  

Less downtime also means lower cost of operation. According to the Digital Enterprise Journal, incident management can cost upwards of US$1.3 million on average each year. That is not to mention the cost of troubleshooting and loss of customer revenue, which can easily add up in the long run. This means organisations can better allocate their resources to higher-value work. 

A reality check

That is not to say that AIOps is a switch that organisations can flip on tomorrow and magically turn their CloudOps into a well-oiled machine. There are very real practical and technical challenges to consider when adopting AIOps. 

For example, interoperability is likely going to be a major hurdle, as legacy tools tend not to be integration friendly. How well AIOps works is also dependent on a few factors, including how ‘clean’ the Configuration Management Database is (which is rarely the case), as well as the cost of resources, which can run high at least in the initial stages. 

Culture, too, is a potential barrier. It is one thing for CloudOps teams to forgo parts of their jobs to AI. It is another for them to fully trust AIOps’ results and decisions. As key decision makers, Chief Information Officers (CIO), too, might not be ready to relinquish control of the entire IT operation to a relatively new technology. After all, early adopters are often the first to experience failures, and many organisations and CIOs alike do not necessarily have such risk appetites. 

Change is in the air

With that said, these challenges should not deter your organisation from taking up AIOps — especially when many are already headed in that direction. 

According to Gartner, 30% of large enterprises will rely exclusively on AIOps and digital experience monitoring tools to monitor applications and infrastructure by 2023 — that’s up from a mere 5% in 2018. 

Going forward, the key is to be gradual and deliberate with AIOps implementation. Instead of seeing it as a silver bullet and making large, sweeping changes, start with simpler tasks, such as security, performance monitoring and incident response. Concurrently, upskill your staff knowledge and skills, communicate how AIOps will benefit the overall IT process, and that no one will be left behind in the digital revolution.

Ultimately, change is the only constant, and the same applies to the digital landscape. Your CloudOps might be working just fine now, but technological investment is not just about what works now, but what will work five, 10 years down the line. The only way to survive and thrive is to lean in, embrace the change and make full use of the opportunities as they present themselves.