A glimpse into the billion object club

Image courtesy of OpenText

OpenText, a Canadian enterprise information management company, is known for software that handles large volumes of content or data, structured or otherwise. What started as a spin-off from a University of Waterloo project in 1991 is now a $3.15 billion publicly traded company with more than 14,000 employees in over 140 offices around the world, servicing over 100,000 customers either on-premises or in the cloud.

Muhi Majzoub became part of that journey when he joined OpenText in 2012 as Senior Vice President of Engineering. Today, as Executive Vice President and Chief Product Officer, Majzoub is responsible for the vision and strategy for OpenText products, and their transition to the hybrid cloud.

Frontier Enterprise recently spoke with Majzoub about using AI in enterprise software, how OpenText handles remote work during the pandemic, and how companies are dealing with organising and managing objects that number over a billion in their enterprise.

When companies grow to a size where they start needing a sophisticated information and document management solution, what recommendations do you make to those embarking on that journey?

One of the first things I recommend to our customers is you take inventory of what you have in your company. Second, you define the low-hanging fruit in process automation. They’re going to bring value to your employees, customers, and partners. Don’t embark on a journey and take on 50 business processes at once. Do it one at a time and set priorities.

Set the right workflow journey, and the right departments that you want to migrate first, then develop the proof of concept in the OpenText solution that gives you the validation data sets or data points that you need to make sure that your business process is going to be supported 100% by the digitisation or automation you’re putting in place. Customers who have followed that process find themselves – in three to six months – deploying in production, bringing value to their organisation, exciting the users on adopting a new solution or digital business process, and then they could go to the next set of priority, and the third, and the one after that.

The more they do these digital automations, the easier the process gets for them because it allows them to isolate any challenges that they ran into in the beginning, and then it speeds up the work of migrating users and content into the new solution that they put in place.

Customers need to put measurements in place to quantify the value and return on investment, and we encourage our customers to work with the OpenText team to define what those metrics are going to be.

What are some of the typical challenges you’ve seen your customers encounter during the migration process?

One of the biggest challenges that sometimes creates delays in a project is scope creep. Meaning you embark on a journey, you define a project plan, and then two-thirds of the way, seven new things get added in that confuses the migration team, and the users who are going to do the user acceptance testing.

My recommendation has always been that once you put a project plan and lock it, stick to that project plan until the project is complete and delivered. Turn it on, allow users to get into it, and start using the solution, then move on and add any additional features.

With 98% of OpenText employees working remotely, how do you keep everything together as a company? 

We have business entities in 40+ countries and in 80+ locations around the world. So OpenText was always used to working remotely with our employees. Our development teams are in 30+ locations, so over the course of 2012-2014, our CEO supported us in putting all the agile methodology and investing into our tooling. We brought so many tools into our engineering organisation to support collaboration and whiteboarding. Most recently, under the pandemic, OpenText deployed Microsoft Teams so we can also collaborate and leverage whiteboarding.

Our development teams could be in Hyderabad, India; in Munich, Germany; and in Ontario, Canada, and they could collaborate in real time: early morning Canada time, early evening India time. They can collaborate on a project or an architecture design document. We’ve been doing it for many, many years.

As far as the enterprise is concerned, two and a half years ago, OpenText was fully digitised. We have a program internally we call OpenText on OpenText, meaning we take every one of our products, where applicable, and deploy it and run it at OpenText before we run it anywhere else. This allows us to digitise all our business processes – whether you’re in finance leveraging our vendor invoice management solution and document management, whether you’re in legal, whether you’re in marketing leveraging our visual experience. If you’re in corporate IP, you’re leveraging our Carbonite backup and Endpoint Security to protect every device on the enterprise, regardless of where they are.

When the pandemic hit, OpenText responded by delivering more capabilities and services in the cloud to better support the remote workforce. To better support our global supply chain to our customers, we delivered new compliance for invoice regulation for our active invoice with compliance app for countries like India, Peru, Vietnam, Chile, Turkey, Hungary, and many others. Today, we support 55+ countries with regulation and invoice compliance.

For customers who are now working remotely, we delivered Core Signature as a standalone product, and an API modern restful service that our customers and partners can integrate any proprietary application into, so they could do digital signatures. Meaning an employee before the pandemic that used to run with papers to their manager for a signature can now do three clicks and send that paper to their manager electronically securely, with all the audit capability of tracking the signature IP and MAC address, and tracking where the signature took place, the time and day it took place – all of the above.

We enhanced our intelligent capture. We delivered Core Content as a multi-tenant public cloud application. We developed a new case management public cloud application that will allow our customers to deploy a case management with events workflow in less than two days. If you give us a requirement, our consulting team can build you a case management app in hours or a day, put it in production, and give you a URL with a secure username and password to leverage that product in under a week. So we have done a lot of things to also enable our customers to be more successful working from home or remotely.

How do I begin modernising an entirely paper-based workflow to an information or document management process within a company?

The best way to start with that is with intelligent capture, leveraging machine learning to ingest and digitise each of those papers that you have. With our intelligent capture, you could train the machine to ingest content, and discover the content as it ingests, then allow the administrator to modify its algorithm. Let’s say this document is related to a blood test that the patient took, versus this document which is the report of an x-ray, versus this document which is a prescription. Once the machine is trained, it is able to do discovery, learn on its own, and adapt. The first thing you want to do is protect that information by digitising and storing it, and capture as much of the metadata off it as you can.

Second thing, you leverage our auto-classification. As the machine is ingesting content and discovering the metadata, the auto-classification engine is taking a document at a time doing the contextual analysis, and then determining: “Okay, this is a prescription, it goes into this folder, it gets attached to this patient ID,” versus “this is a medical report,” versus “this is an x-ray and it classifies and stores all that content with the right permissions.”

Once you have all of that, a content administrator can now modify and manage that content more securely, can put record management retention and records management policies against that content, and how many years I need to keep the data. For example, if it’s a medical record, I need to keep it as long as the patient is alive and for a few years after the patient’s passing. If it’s a financial record and tax record in the US, by law, I need to keep it only for seven years, so you then define these record retention policies.

After that, you go on building the content. As enterprises try to digitise, they have millions of records. We have at OpenText what we call the billion object club. That means they are storing a billion documents or images in our solutions, either in Documentum or in Content Suite. They are managing all of these by leveraging machine learning and intelligent capture, by leveraging auto-classification, record management, and federated compliance, then leveraging Magellan on top of it to do predictive intelligence. Magellan can then run an algorithm and tell you your system is running at 60% utilisation and based on a five-year analysis, you will run out of database space in three years or in three months.

How do you see AI and machine learning impact OpenText’s own services, especially in the coming years?

Machine learning and AI is getting integrated into many areas and I’ll give you a few real-time examples that already exist, and few that will be coming in the very near future. Let me start with things that exist today.

Today, our BrightCloud intelligent service has a database of 22+ billion IP addresses and URLs on the Internet that are potentially risky. Those are URLs that have been known to be hacked or contain malware. We capture these and provide access to companies so they could call it in real time and identify if a URL or an IP address is secure or risky. Our media management solution integrated into machine learning can identify images and allow you to give contextual analysis of an image that gets stored into our digital asset management solution.

We’re coming up with a new solution, and we call it Magellan Risk Guard, which manages the Magellan crawlers. If I asked you: You have a document management system and you’re the CIO, do you know how many documents you have and what is the size of that document? You’ll tell me: “Of course I know. I have 7.3 million documents and they are taking 3 terabytes or 5 petabytes,” or whatever that information is. That’s easy to answer. 

But if I tell you, depending on the country you live in, the regulations that the government has imposed, do you know if anybody is storing offensive political material in your documents? Or offensive imagery in your images? Or obscene language that could be a liability for your company?

The answer is, there is no way I’m going to read 7.3 million documents and review them because by the time I finish reading them, another hundred million documents may be added. That’s what Magellan Risk Guard is for. It leverages our latest machine learning and contextual content analysis to crawl the content in Documentum, Content Suite, or a file share on your network. In the future, we will crawl SharePoint, SharePoint Online, Box, Dropbox, and all of our SAS applications through connectors.

We enable you to define business rules. If bomb-making material is not allowed in your environment, you define the rule and we’re going to look at every document. If there are seven documents that mention bomb-making material in them, we’re going to flag them for your administrator, HR department, or legal counsel to review the content in real time. We can quarantine the data until you approve it to become visible again. If anyone is storing child pornography, which is illegal in many, many countries, we allow you to investigate and identify those images and quarantine them, or put business rules to delete them. If anyone is putting obscene political extreme language, based on what happened in the US election last year and the political unrest that took place, you can define business rules, then based on these business rules, define the content.

That’s a good point, because content used to be just documents but now you have audio, video, images, and chat that goes away within a certain amount of time. That is the future.

The content sprawl is all around us. I can tell you 25 years ago when I got my first laptop, it had 20 megabytes of hard disk space. In those days, I was working at Oracle Corporation, and it took me three to four years to fill up my disk drive and request and request an external drive. Today, my new Mac has a terabyte of information, and in under a year I put 750 gigabytes of data on it. That’s the difference. Now take that and multiply it by thousands of people in an enterprise or in a government entity, and you’re literally talking petabytes of information growing every day by 1-2%.

From an R&D perspective, what is the most exciting thing happening in the OpenText labs at the moment?

I divide the innovation in our division in multiple areas. In the modern way of work, we are delivering a new product called Core for Content and that’s our public cloud multi-tenant content services platform. It allows small, medium, or enterprise companies to come online in under a day, sign up for a tenant, and provision users, and start storing content in the cloud with mobile application access, with offline capability, with any other. That, for me, is really exciting because it supports the new, modern way of work.

Even when the pandemic is over, the modern way of work – in which many employees remain remote or partially remote – is here to stay. 100% of people are not all going to go back to their offices. This solution will integrate digital signatures, intelligent capture, Magellan Risk Guard, and many others.

Delivering to ethical supply chains and the global supply chain is very exciting for me, because OpenText today services 60,000 customers for our supply chain network with over a million partners and suppliers. We give them self-service capabilities; in minutes, they can register and send an invite to a partner to join their supplier network, very similar to how I would send a friend request on LinkedIn or Twitter.

The third area is Endpoint Security and Protection. Two or three weeks ago we heard about the gas pipeline in the US that was held ransom and was shut down for almost a week. If my computer is held ransom today, for any reason, I can restore my laptop in under an hour and rebuild it for me. I could be up and running, with all of my data.

Plus, I have our Webroot Antivirus, where every time a document is downloaded or comes through email, it is being validated to be virus free, and I am able to continue to protect my device and the devices of every one of our employees.

Lastly, our advanced technology area with AI, with machine learning. Projects like Magellan Risk Guard, projects like our local development with AppWorks, which are fully integrated in every one of our solutions.Those are very exciting, because those make our customers and partners more efficient and more productive, that means they can extract value. Instead of spending money on upgrades and wasted effort, they’re spending money on innovating and adding value to their customers and employees.