Building the backbone for AI readiness

This article is sponsored by Lenovo.

- Advertisement -

Artificial intelligence (AI) has penetrated almost every industry imaginable, with its widespread use ranging from chatbots to massive code libraries. AI is also often used as a complementary technology, like in the case of Lenovo’s precision medicine initiative, which combines high-performance computing (HPC), genomics, and AI to hasten the process of genome analysis.

But what exactly are the infrastructure requirements to support AI? And are enterprises across verticals prepared to make such investments?

During a podcast organised by Jicara Media and hosted by Lenovo, Sinisa Nikolic, Director of HPC and AI at Lenovo Asia-Pacific, discussed the common roadblocks to AI readiness, as well as ways to overcome them, among others.

- Advertisement -

“Ultimately, infrastructure is changing. Inclusion of GPUs is driving this massive ability to run machine learning. But then, the stresses on standard systems are just too great today. Not every machine that a client has in their data centre is enabled enough. (It) does not have the required memory, (and) does not have the amount of storage. Storage is something that everyone tends to sort of not think about as one of the first things (to prepare for),” Sinisa said.

According to him, data today is like oil, in that it is its own value.

“You can’t even ascribe a value to data. Today, it can be used in the right way, with these types of mining applications through AI and business intelligence, and all of these types of things into just generating wealth for a corporation or even individuals. So systems themselves are not built,” Sinisa explained.

Things to prepare for

Since AI has become ubiquitous, certain changes in IT infrastructure have to be made to harness the full potential of the technology, not to mention the readiness of manpower who will be using it.

But those are not the only decisions that have to be made, Sinisa noted.

“When you’re an organisation, it’s not just compute systems that change. How does one administer? How does one run the IT department today? Are you cloudifying applications? Are you going to run some of the training or inferencing models? Are you going to containerise? So it’s not just pure infrastructure. It is everything surrounding these things. How does one build up a machine-learning, supercomputing back end? Do I do it on-premises? Do I do it in the cloud? Do I want capital expense? Do I want an operating expense? How does one have to do that? So these things come into play.”

At present, some organisations, especially smaller ones, are still on the fence about how AI will drive more value for them, yet larger enterprises are ready to embrace the transformation, Sinisa remarked.

“They’re working on POCs, (and) they’re working on developments. They’re working through that, and really thinking how their data centre and infrastructure are going to change over time. But like everything, what physically gets changed, and what is physically updated or refreshed, are really based upon what types of use cases you’re going to be optimising or running,” he elaborated.

Sinisa then illustrated how AI works, vis-à-vis the IT infrastructure requirement: “As an example, you would need an incredibly powerful GPU-enabled supercomputer when you train a model. You might teach the difference between a bird and an airplane, or you might train a system to understand if a circuit board has been populated correctly or not. Have the chips been put the right way? Are the diodes the correct polarity? Because AI’s cousin, the robot, may have picked one up wrongly, who knows? So this is a learning phase. And the learning phase, or the training phase takes an incredible amount of compute.”

“Most organisations have now certainly done a lot of that work in the cloud. They have realised that the cloud data model itself is costly, because the amount of training, the amount of iterative work— there’s millions and millions of iterations— ends up costing a lot of money. So they’re bringing that back in-house, or on-premises,” he added.

After training the models, the next stage would be the deployment of these models.

“Today, you’re using lots of edge devices for that. So if you’ve taught that circuit board, for example, if you’ve gone through the training phase, and you understand that the circuit board needs to look a certain way, you could use mocks, very high-definition CCTV cameras, or microscopy, to look at those boards as they come through. And it would appear at the board, and it would be seen if the components are inserted correctly, by its cousin, the robot. Then it would make a determination of ‘What do I do with that board next?’ Do I flag it? Do I push it to the side? Do I send it back for a refresh?,” Sinisa said. 

“Now, all of these things require an infrastructure edge network. Do I do that (with) 5G? Do I do that hardwired? So all of this infrastructure needs to be thought about, and I’ve simplified stuff. But it’s incredibly complicated, and this all takes an enormous amount of compute, it takes an enormous amount of storage, because data is just being just produced— petabytes upon petabytes and exabytes of data daily across the planet. You’ve got to manage these things, and you need a management infrastructure. You need to understand that you’re doing nearline storage, (or) offline storage,” he added.

Ready solutions

With all of these necessary infrastructure preparations, especially with developments in 5G and the Internet of Things, which will certainly leverage AI, Lenovo has already developed a portfolio capable of integrating all the systems and hardware in place, Sinisa said.

“We sort of call it from desktop to data centre. It’s a data scientist infrastructure. As an example, from AI development, we have powerful development workstations, and mobile workstations where you do sandboxing, prototyping, and model development. From the training aspect, we have some powerful multiprocessor, multi GPU-enabled simple computing nodes, like our SR670, as an example. And you would scale out and do distributed training across those. It is optimised, especially for GPU, so you can scale as the model grows, scale as the users grow, and scale the network with that,” he described.

“We have water-cooled systems (and) air-cooled systems in that infrastructure that reside in a 19-inch rack, no problem at all. You do all that training aspect, and that connects up to our DSS infrastructure, our large-scale parallel I/O storage. It’s managed with a technology called LiCO (Lenovo Intelligent Computing Orchestration), which is Lenovo’s integrated cluster offering. It allows you to do all of the optimisation of workloads across that cluster, specifically IR workloads in this case, where you can move them to different systems of the supercomputer. Then once you’re done with all that training, you go through this inferencing stage, which is really where you’re deploying these models,” he added.

In closing, Sinisa emphasised the importance of enterprises getting their hands on what the latest technology has to offer. For that, Lenovo is ready to partner with organisations for their digital transformation needs.

“We’re in refresh mode constantly, as new technologies become available. As our research team implements new software technologies like LiCO, that allows for ease of use of these types of these clusters, and optimisation of these clusters, it just becomes exciting. It becomes a fully integrated technology partnership with a client, which is what I think is actually incredibly exciting for us,” he concluded.