There has been a steady take-up of data virtualisation technology across industries over the last few years. This is largely due to the many benefits that the technology brings to data management and usage. In fact, data virtualisation is slowly being seen as a critical foundation for modern data architectures.
As the adoption of data virtualisation grows, it is also discussed more frequently by analysts and vendors. However, the technology is not always described accurately. Often, data virtualisation is described as being similar to the early data federation systems of the late 1990’s or early 2000’s, which are extremely different from modern data virtualisation platforms.
To provide a better understanding of data virtualisation, I will clarify some of the common myths about this technology.
Myth 1: Data virtualisation is equivalent to data federation
When data virtualisation was first introduced, data federation was one of its primary capabilities. Data federation involves the ability to answer queries by combining and transforming data from two or more data sources in real time.
Similarly, the data-access layer established by data virtualisation contains the necessary metadata for accessing a variety of data sources and returning the results in a fraction of a second.
However, this capability of data virtualisation has been broadened by leaps and bounds in recent times. Data virtualisation’s toolset now includes capabilities such as advanced query acceleration, which can improve the performance of slow data sources.
Data virtualisation solutions also provide sophisticated data catalogues and can build rich semantic layers into the data-access layer so that different consumers can access the data in their chosen form.
Myth 2: Data virtualisation overwhelms the network
The data sources used in analytics architectures typically contain exceptionally large volumes of data. This is especially true as data generation has been increasing exponentially.
One might think that data virtualisation platforms will always need to retrieve large data volumes through the network, especially when they are federating data from several data sources in the same query, which would heavily tax query performance.
The fact is, the query acceleration capabilities of data virtualisation platforms, mentioned above, also minimise the amount of data flowing through the network. These techniques offer the dual advantage of improving performance while reducing network impact, freeing up the system to accommodate a heavier query workload.
This is possible due to advances in the query execution engines of data virtualisation, which act like coordinators that delegate most of the work to the applicable data sources. If a given data source is capable of resolving a given query, then all of the work will be pushed down to that source.
However, if the query needs data from multiple source systems, the query execution engine will automatically rewrite the query so that each source will perform the applicable calculations on its own data, before channelling the results to the data virtualisation platform. These results involve far less data being read over the network, compared with the early incarnations of federation tools.
Myth 3: Data virtualisation means retrieving all data in real time
With data virtualisation, the default mode for query execution is to obtain the required data in real time directly from the data source. This will often perform well, and this is the most common execution strategy used by our customers.
However, advanced data virtualisation platforms also support additional execution methods to further improve performance and better accommodate slow data sources.
For instance, data virtualisation can replicate specific virtual datasets in full. This can be useful for specific cases, such as providing data scientists with a data copy, which they can modify and work with without affecting the original data.
Today, data scientists can decide between a range of options from zero, to partial, to full replication. Also, that decision is transparent to the data consumers, and it can be changed any time without affecting the original data source.
Next-generation data virtualisation
Data is fast becoming the lifeblood of modern-day businesses that operate in increasingly digitalised environments. Many businesses turn to data virtualisation as a foundational technology to drive business performance, operational agility, and overall resilience.
Data virtualisation has evolved over the years to offer both advanced performance and advanced support. It now incorporates emerging technologies such as artificial intelligence, to automate manual functions and speed up data analysis. These capabilities effectively free up IT teams so that they can focus on innovation and other business objectives.
Business and technology leaders need to understand the true benefits and possibilities of data virtualisation. Only then can they fully appreciate the potential that the technology brings to modern analytics.