Inside the data hub at Petronas

Image courtesy of Untung Bekti Nugroho

Organisations have been using data for decades to gain insight in making decisions. IT has enabled modern services like e-commerce and fintech, as well as emerging industries like artificial intelligence, machine learning, and smart manufacturing.

But there’s a problem: IT is also facing an exponential increase in the sheer amount of data generated and collected in computer systems. According to IDC, the total global information volume in 2007 was about 165 exabytes (EB) of data. In contrast, estimates for data volume consumed worldwide in 2022 is at 97 zettabytes, which is about 97,000 exabytes.

With this data explosion, ensuring that an organisation’s data is correct, relevant, complete, and timely has become more challenging. To make the right business decisions, data needs to be healthy, which means everyone in the organisation can access the information they need without wondering if it’s accurate.

Talend, a software company based in Redwood City, California, is one such entity that specialises in data health. In 2020, Petronas – a Malaysian oil and gas company – chose Talend’s Data Fabric platform to help create an Enterprise Data Hub (or EDH), which served as a single source of truth for data analytics and a central platform for accessing data.

Solving data problems

Because of Petronas’ extensive data environment and regulatory compliance requirements (it has over 400 subsidiaries operating in more than 50 countries), the company needed a data health solution to make sense of all the data it generates.

Habsah Nordin, Head of Enterprise Data at Petronas. Image courtesy of Petronas.

Datin Habsah Nordin, Head of Enterprise Data at Petronas, believes that data health isn’t so different from people’s own health.

“Data health is an ongoing discipline requiring preventative measures, effective treatments, and a supportive culture. Inaccurate or incomplete data can attack organisations like disease, leading to ineffective execution of business strategies, poor return on investment due to data integrity issues, penalties for regulatory non-compliance, even potential jail time or revocation of licences due to mismanagement of data,” she explained.

Because of this, Petronas wanted to take a holistic approach to data management, specifically in establishing its EDH. “Our vision of creating an EDH allows us to gather data from multiple business segments within our organisations,” said Datin Habsah.

Datin Habsah, however, understood that there would be no “one-size-fits-all” product that would address all of Petronas’ problems.

Talend’s data health solution addresses several data pain points, including the following:

  • Building and monitoring data pipelines for data ingestion and processing, measuring data health, and remediating data issues in line with defined standards.
  • Capturing information on data flows and transformation over time, including both business and technical metadata, to ensure consistent understanding of the data.
  • Enabling data governance with a secure single point of control where it allows collaboration to improve data accessibility, accuracy, and business relevance and supporting regulatory compliance through intelligent data lineage tracing and compliance tracking.

Achieving worthwhile outcomes

Considering Petronas’ size, as well as its legacy software and systems in place, implementing Talend’s Data Fabric platform faced a few issues. Datin Habsah remarked that integration had some challenges in the initial stage.

Stu Garrow, SVP & GM for Asia Pacific at Talend. Image courtesy of Talend.

Echoing Datin Habsah’s views on the notion of an all-encompassing solution, Stu Garrow, SVP & GM for Asia Pacific at Talend, remarked that data health “means different things to different business stakeholders”, and the level of expectation sometimes cannot be the same.

“Since supporting a data health strategy in the correct context is essential, the challenges of understanding the domain and its use of data are crucial,” said Garrow.

“The ability to work with large data workers in Petronas and meet their expectations all along the project remained a major challenging task apart from the usual technical one. As well as, complying with solid security and privacy policies in the organisation is always in our mind when dealing with data health solutions,” he added.

However, through close engagement, collaboration, and support from the Talend team, these challenges were addressed accordingly, said Datin Habsah. “The Talend Data Catalog now inventories over 200 data types from multiple data sources that are connected to EDH,” she shared.

Furthermore, it appears that integration is well worth the effort.

According to Datin Habsah, Petronas has achieved results on several fronts since the data governance initiative was launched in 2020. This includes ~30 digital use cases which EDH were said to have enabled.

“We have already seen improvements in productivity, production optimisation, and forecast with an accurate understanding of trends,” she observed.

So what’s next for Petronas in terms of its data hub?

“For the future, we want to ensure that the EDH can cater for our existing databases and the ever-growing unstructured data with the purpose of providing access to these valuable data with ease,” Datin Habsah said.

“We’re continuously building and enhancing our capabilities in ingestion technology, storage strategy, big data processing, unstructured data modelling, data serving (e.g. streaming or near real time), advanced analytics, data governance solution, data access and security, and the data warehouse and business intelligence of EDH,” she added.

What data health means

Garrow said Talend protects sensitive data through built-in masking in the service.

“On top of that, whether customers use remote engines or cloud engines, their datasets remain on systems and data repositories that they manage. Metadata, Designs, Talend Jobs, Artifacts, and any other objects that Talend stores to provide services or for security reasons are isolated via tenant-specific schemas and tenant-specific data encryption keys,” he added.

From Talend’s perspective, good data health means an organisation’s top personnel and decision makers are confident in making data-driven decisions.

“Healthy data is critical for any company — but keeping data in good condition requires a careful balance between availability, usability, integrity, and security,” Garrow explained.

What sets Talend’s approach apart from competitors, said Garrow, is that their data fabric combines data integration, data integrity, and governance in one platform.

“Data fabric is a remedy for unhealthy data. Challenges such as rushed cloud transformation, poor governance, and negligible IT practices can result in a proverbial “data landfill,” he remarked. “The popularity of data fabric solutions becomes more prevalent as these data landfills get bigger and impact more bottom lines. Data fabric prevents data landfills from forming by providing connective tissue for enterprises’ disparate data to be easily visualised and accessed from the same source.”

Integrating emerging tech

In addition to continuously refining their capabilities, Garrow said Talend plans to make their products smarter and more collaborative by using machine learning (ML).

He added that Talend plans to add capabilities like:

  • Discoverability (the ability to easily locate available data);
  • Alerting (the ability to configure conditions for data-related notifications);
  • Recommendations (using ML to classify data and make recommendations).

“Collaboration is a crucial concept in what we’re doing to continue supporting data health as a team sport. We will support collaboration by offering integrated workflow capabilities, offering each user population a window into the data health process,” Garrow concluded.