It is important for organisations to gain data accessibility and not just data availability. That is, dedicated users of data must be able to access the data they need at a second’s notice to be competitive and effective in today’s fast-moving business environment.
For years, we have been trying to democratise data, making it more accessible for analysis to deliver more value. Yet, centralised data architectures that consolidate data to a single place, such as a data warehouse or data lake, tend to inhibit the efforts to allow for easy access to data.
Some limitations of a centralised data architecture include lack of flexibility, as it will never be flexible enough to accommodate the needs of all the different departments within a large organisation. Another limitation is slow data provisioning since it takes time to centralise data from multiple sources including extracting, ingesting, and synchronising, and it impedes data consumers from rapidly accessing data on-demand, in response to changes in real time.
Zhamak Dehghani, Director of Emerging Technologies at Thoughtworks, recently introduced the idea of a decentralised data infrastructure called a data mesh to resolve this problem. A paradigm shift in data management, data mesh is designed to move organisations from monolithic architectures – such as data warehouses and data lakes – to decentralised architectures.
In a data mesh architecture, organisational units (called “domains”) are responsible for managing and exposing their own data to the rest of the organisation. The key benefit is that this approach cuts down on the iterations required for the data since domains have a better understanding of how their data should be used. Additionally, the approach offers domains the autonomy to use the best tools for their requirement, removing the bottlenecks that commonly exist in centralised infrastructures.
Taking a layered approach
While data mesh is a promising new architecture, the decentralised approach to data management may also introduce challenges like data silos and data duplications.
Interestingly, this is where data virtualisation can play a contributing role. Data virtualisation is a data integration technology that fits data mesh implementation like a glove as it allows for unified data access unlike extract, transform, and load (ETL) processes and other batch-oriented approaches.
Data virtualisation works as an enterprise-wide layer above an organisation’s diverse data sources. Data consumers simply query the data virtualisation layer to query across the sources. The data virtualisation layer automatically retrieves the necessary data during query and this process abstracts consumers from the complexities of actual access to data.
While data virtualisation layers do not contain actual data, they can store all necessary metadata for domains to access from multiple sources and enable organisations to automate role-based security and data governance protocols across the organisation from a single point of control.
Building data products
Data virtualisation enables domains to create and implement virtual models quickly from any data source, even without having to understand the complexities of the sources that feed it. By minimising replications, it also speeds up the process to iterate multiple versions of data products.
These data products can then be made accessible via a flexible array of methods such as SQL, REST, OData, GraphQL, or MDX – and because developers again do not need to write any codes, the data products can be easily and automatically published in an organisation-wide data product catalogue. By centrally storing metadata, data virtualisation layers provide all the necessary ingredients for full-featured, comprehensive catalogues to an organisation’s data assets, organised by domain.
Designing data domain autonomy
Another essential benefit of data virtualisation is that it allows domains to autonomously select and scale the data sources that best compliment their products and suit their specific needs. For instance, many business units will already have in place existing data analytics systems, which they can reuse with ease without having to introduce new skills, as well as reusing applications tailored for SaaS applications and operate their own data marts. Organisations can also leverage data virtualisation to prevent the interference with other internal processes and ensure adequate performance. On top of that, each domain can also be scaled independently.
However, it is important to point out that data virtualisation does not replace monolithic repositories like data warehouses and data lakes; these data repositories remain dominant sources for certain data products. In such cases, having a data virtualisation layer on top of physical data repositories ensures that data products are still accessed through the virtual later, and are still governed by the same protocols applicable to the rest of the data mesh.
Strengthening the data mesh
Data mesh is an interesting new architecture that can help organisations avoid the drawbacks of highly centralised data infrastructures. Data virtualisation offers a complementing modern data integration and data management technology that further strengthens the data mesh concept, allowing for a straightforward implementation, with no need for replacement of legacy equipment or hardware.