Data is generated in every step of our daily lives. Companies and organizations are competing to up the game of gathering and analyzing data under local compliance and regulation. In Hong Kong, the government has already outlined in 2017 that it would open up suitable public data as raw material to encourage technological research and innovation. In June this year, the Hong Kong Monetary Authority (HKMA) published a three-year roadmap to integrate Artificial Intelligence, such as Machine Learning (ML), into the regulatory technology to keep up with the increasing amount of data being produced. This will help the HKMA quickly capture and process structured and unstructured data from proprietary sources, and identify risks and enhance control.
So, in the private sector, how should Hong Kong businesses make better use of data science? I am going to introduce a few practical steps in this article.
Currently, most businesses are limited in improving their use of data science with their existing tools and workflows. Those who integrate data science and ML find it helps drive better business decisions and generate income and insights. This approach can be applied to organizations across various industries, from oil and gas to financial services and many in between.
However, developing and deploying ML workflows can be challenging. The work is often limited by lack of access to data, inadequate compute resources, difficulty managing interdependent libraries and package versions, and security constraints. To address these challenges, Red Hat launched OpenShift Data Science.
OpenShift Data Science provides a sandbox environment for data scientists to develop, train and test machine learning models and deploy them in intelligent applications. It provides a supported, self-service environment where data scientists and machine learning engineers can carry out their daily work, from gathering and preparing data to testing and training ML models. With this service, customers can access a range of AI/ML technologies from Red Hat partners and independent software vendor offerings, enabling corporations to build their flexible sandbox environment containing some of the latest data science tools.
The machine learning workflow
How can data science tools be linked to the ML workflow? Let’s first recap the stages of the ML workflow that data scientists typically follow when applying AI/ML to solve a business problem.
- The workflow begins with gathering and preparing data. Data often has to be combined from a range of sources, therefore exploring and understanding data plays a key role in the success of a data science project. For example, HKMA pointed out in the Fintech2025 initiative that it plans to stimulate the industry’s technology adoption through infrastructure enablements, such as establishing Commercial Data Interchange, Digital corporate identity and DLT-based credit data-sharing platform. By transforming the data infrastructure, HKMA can gain access to more data.
- Once the data has been gathered, cleaned and processed, the second stage of the ML workflow can begin. When training a model, parameters are tuned based on a set of training data. Data scientists train a range of models and compare performance while considering trade-offs such as time and memory constraints in practice.
- After model training, the next step of the workflow is production. This step used to involve a hand-off between a data scientist and developer, but there is increasing evidence that in the future it will be data scientists who are responsible for the integration of models into applications.
- Finally, data scientists need to monitor the performance of models in production, tracking prediction and performance metrics.
By providing a unified, self-service sandbox environment with integrated tooling and access to a range of open source data science projects and proprietary software, OpenShift Data Science enables data scientists to focus on the task at hand and develop and train models rapidly in a more secure, supported environment.
With the ability to connect to GPUs on-demand, model training and testing can be accelerated, reducing the time needed to develop models and gain insights. This is conducive to rapid prototyping and experimentation use cases.
While more companies around the world are extracting useful insights from the data, managers should always pay attention to the constantly evolving regulatory environment. Ensuring that all data is gathered, stored and used in a legal way is crucial for brand reputation and business success.