Innovation through data and generative AI

Asia-Pacific is a hotbed of innovation, with businesses across the region embracing new technologies to gain a competitive edge. Generative AI is one of the most exciting developments to have emerged recently. Its adoption is increasing, and many businesses in the region have integrated it into their operations to drive innovation and growth.

According to Statista, the generative AI market in Asia is expected to grow annually by 22.65% between 2024 and 2030, reaching US$60.83 billion by 2030.

Generative AI is deployable for tasks as diverse as simplifying project workflows, ensuring product quality, design, employee coaching, content production, and customer relationship management. The potential of generative AI to impact organisations’ businesses is immense.

However, the starting point of all these varied applications is data.

Beyond simply feeding the underlying language learning model with a large amount of data, this data must be well-organised to function effectively. Think of data and generative AI as interlocking pieces of a Mondrian puzzle, each segment crucial to the larger picture of business innovation. Data must be carefully curated and strategically utilised to unlock the full potential of generative AI.

To lay a strong foundation that allows firms to maximise ROI from their AI investments, corporate data must be meticulously prepared and optimally used with the right tools throughout its lifecycle.

The 5W-1H rule

In many organisations, the primary sources of data include internal product documentation, customer purchase histories, support information, and media like videos and images. For generative AI engines to effectively utilise this diverse data, meticulous organisation is crucial.

Enter the 5W-1H approach to data preparation:

  1. When: It’s important to know when and how often data is collected. In-house product documentation is created with each product release and update. Customer information is stored in CRM systems or similar as necessary. In the case of video and audio, files may be generated in real time. Systems must be built to promptly collect data as soon as it is created.
  2. Where: Data is located in numerous places. In-house product documentation may be edited on a local PC and stored on a file server or in the cloud. Customer information is typically stored in databases either on-premises or in the cloud. Video and audio are often generated at the edge and must be collected over the network. Systems need to be properly set up to gather data from each location.
  3. Who: Clearly defining who owns the various data sets is crucial. Only then can business owners and stakeholders collaborate effectively with data owners to better manage and protect their data and ensure consistent data usage.
  4. What: Data comes in various forms and formats. Product documents are typically stored in office file formats such as Word and Excel, while customer information is often organised as structured data in databases. Media files, which include video and audio, are unstructured. Understanding the types of data being handled enables organisations to preprocess and analyse them effectively.
  5. Why: When leveraging data for AI, the problems to be solved should be defined from the outset. This approach allows an organisation to focus on the most relevant data. Measurable numerical targets should also be established to track progress over time.
  6. How: Appropriate methods for data collection should consider the nature and location of the data. For instance, collecting data from file servers may involve protocols like NFS or CIFS, from databases may require specific accounts and database-specific protocols, and real-time data collection necessitates compatibility with edge devices.

Boosting speed to deployment

After organising your data, the next step is to accelerate the deployment of your generative AI system. MLOps is key to streamlining workflows and speeding up the transition from AI development to production. At this stage, it’s essential to fully utilise your organisation’s storage infrastructure, especially for MLOps and data operations including data pipelines and DataOps.

To facilitate a smooth and rapid deployment, the following key features of enterprise storage systems should be leveraged:

  1. Optimising enterprise data management
    Enterprise storage facilitates the collection of data from diverse sources through multi-protocol support, including Network File System (NFS) and Common Internet File System (CIFS), streamlining essential management tasks such as data protection, versioning, and security for AI applications. Moreover, recent advancements in container support enhance MLOps and DataOps, allowing data scientists to focus on AI model development.
  2. Advancing hybrid multi-cloud strategy
    The deployment of a hybrid multi-cloud environment and data mobility is crucial, especially for generative AI and language learning models that rely on cloud-exclusive services and functionalities. Establishing a data pipeline between on-premises environments and the cloud offers flexibility and scalability to support AI initiatives. Enterprise storage provides features that work with cloud vendors’ object storage, or enable mirroring and caching within the cloud, facilitating a tailored hybrid cloud strategy for enterprises.
  3. Implementing security measures
    Security in AI is paramount as data is constantly exposed to the risk of cyberattacks. With increasing regulations, enterprise storage systems equipped with security features like multi-tenancy and encryption ensure data safety while managing it efficiently. This allows companies to hold multiple data sets in minimal space for auditing and compliance purposes. Understanding the nature of data and applying these security measures enables organisations to drive transformation and deepen business insights.

In the intricate tapestry of business evolution, data are like pieces of a Mondrian puzzle, where each piece holds the potential for transformative insights. However, organisations must first understand the nature of the data they wish to leverage.

When data is varied and widely dispersed, the collection mechanism becomes more complex. Approaches such as MLOps, data pipelines, and DataOps can help from the AI-centric and data-centric operational perspectives, respectively.

Appropriately blending these approaches will accelerate organisations’ generative AI programs, bringing business benefits sooner rather than later.