You are currently browsing the tag archive for the ‘Data Integration’ tag.

Data virtualization is not new, but it has changed over the years. The term describes a process of combining data on the fly from multiple sources rather than copying that data into a common repository such as a data warehouse or a data lake, which I have written about. There are many reasons for an organization concerned with managing its data to consider data virtualization, most stemming from the fact that the data does not have to be copied to a new location. It could, for instance, eliminate the cost of building and maintaining a copy of one of the organization’s  big data sources. Recognizing these benefits, many database and data integration companies offer data virtualization products. Denodo, one of the few independent, best-of-breed vendors in this market today, brings these capabilities to big data sources and data lakes.

Google Trends presents a graphic representation of the decline of the popularity of the term data federation and the rise in popularity VirtualizationTrendingof the term data virtualization over time. The change in terminology corresponds with a change in technology. The industry has evolved from a data federation approach to today’s cost-based optimization approach. In a federated approach, queries are sent to the appropriate data sources without much intelligence about the overall query or the cost of the individual parts of the federated query. Each underlying data source performs its portion of the workload as best it can and returns the results. The various parts are combined and additional post-processing performed if necessary, for example to sort the combined result set.

Denodo takes a different approach. Its tools consider the costs of each part of the individual query and evaluate trade-offs. As the saying goes, there’s more than one way to skin a cat; in this case there’s more than one way to execute a SQL statement. For example, suppose you wish to create a list of all sales of a certain set of products. Your company has 1,000 products (maintained in one system) and hundreds of millions of customer transactions (maintained in another system). The federated approach would bring both data sets to the federated system, join them and then find the desired subset of products. An alternative would be to ship the table of 1,000 products to the system that holds the customer transactions, load it as a temporary table and join it to the customer transaction data to identify the desired subset before sending the product data back to its source. Today’s data virtualization evaluates the costs in time of the two alternatives and selects the one that would produce the result set the fastest.

Data virtualization can make it easier, andvr_BDI_16_importance_of_virtualization therefore faster, to set up access to data sources in an organization. Using Denodo users connect to existing data sources, which become available as a virtual resource. In the case of data warehouses or data lakes, this virtual representation is often referred to as a logical data warehouse or a logical data lake. No matter how hard you work to consolidate data into a central repository, there are often pieces of data that have to be combined from multiple data sources. We find that such issues are common. In our big data integration benchmark research one-fourth (26%) of organizations said that data virtualization is a key activity for their big data analytics, yet only 14 percent said that they have adequate data virtualization capabilities.

Not all the work is eliminated by data virtualization. You must still design the logical model for the data that you want to provide, such as which tables and which columns to include, but that’s all. Virtualization eliminates load processes and the need to update the data. In the case of big data, there are no extra clusters to set up and maintain. The logical data warehouse or data lake uses the security and governance system already in place. As a result, users can avoid some of the organizational battles about data access since the “owner” of the data continues to maintain the rights and restrictions on the data. Our research shows that organizations that have adequate data virtualization capabilities are more often satisfied with the way their organization manages big data than are organizations as a whole (88% vs. 58%) and are more confident in the data quality of their big data integration efforts (81% vs. 54%).

In its most recent release, version 6.0, Denodo enhanced its cost-based query optimizer for data virtualization. Many of the optimizer’s features would be found in any decent relational database management system, but the challenge becomes greater when the underlying resources are scattered among multiple systems. To address this issue Denodo collects and maintains statistics about the various data sources that are evaluated at run time to determine the optimal way to execute queries. The product offers connectivity to a variety of data sources, both structured and unstructured, including Hadoop, NoSQL, documents and websites. It can be deployed on premises, in the cloud using Amazon Web Services or in a hybrid configuration.

Performance can be a key factor in user acceptance of data virtualization; users will balk if access is too slow. Denodo has published some benchmarks showing that performance of its product can be nearly identical to accessing data loaded into an analytical database. I never place much emphasis on vendor benchmarks as they may or may not reflect an actual organization’s configuration and requirements. However, the fact that Denodo produces this type of benchmark indicates its focus on minimizing the performance overhead associated with data virtualization.

When I first looked at Denodo, prior to the 6.0 release, I expected to see more optimization techniques built into the product. There’s always room for improvement, but with the current release the company has made great strides and addressed many of these issues. In order to maximize the software’s value to customers, I’d like to see the company invest in developing more technology partnershipsVR2015_InnovationAwardWinner with providers of data sources and analytic tools. Users would also find it valuable if Denodo could help manage and present consolidated lineage information. Not only do users need access to data, they need to understand how data is transformed both inside and outside Denodo.

If your organization is considering data virtualization technology, I recommend you evaluate Denodo. The company won the 2015 Ventana Research Technology Innovation Award for Information Management, and its customer Autodesk won the 2015 Leadership Award in the Big Data Category. If your organization is deluged with big data but is not considering data virtualization, it probably should be. As our research shows, it can lead to greater satisfaction with and more confidence in the quality of your data.

Regards,

David Menninger

SVP & Research Director

Follow Me on Twitter @dmenningerVR and Connect with me on LinkedIn.

In our definition, information management encompasses the acquisition, organization, dissemination and use of information by organizations to create and enhance business value. Effective information management ensures optimal access, relevance, timeliness, quality and security of this data with the aim to improve organizational performance. This goal is not easily met, especially as organizations acquire ever more data at an ever faster pace. In our business analytics benchmark research of more than 2,600 organizations, almost half (45%) have to integrate six or more types of data in their analyses. More than two-thirds reported that they spend more time preparing data than analyzing it. To assist in dealing with these sorts of issues and others, we’ve laid out an ambitious information management research agenda for 2012.

In recent years the complexity of information management has risen dramatically. The volume of information being processed has increased exponentially and so have the challenges of ensuring consistency and quality and managing governance and the information life cycle. New data types and sources such as comments on social media have emerged and must be integrated into an organization’s information assets. Moreover, in many cases the boundaries between organizations and the outside world with which they interact have become far less distinct, leading to the need for a more expansive understanding of information management. Our Business Data in the Cloud research shows that data is seldom stored in only one repository; the majority of organizations (86%) need to bring together cloud-based data and on-premises data.

We will provide new insights on the dynamics of the information management market as we complete research on Information Management Trends. This research will illuminate the priorities organizations place on data quality, master data management and data governance. It will also explore ways in which organizations are incorporating virtualization and replication for broader and faster data access. The growing volumes and sources of data will require data integration that can help facilitate better linkages across IT and into business. We will assess the vendors and products in a Value Index for Data Integration that will determine what suppliers can be best fit for your enterprise.

Our research will also help organizations facilitate adoption of and use of big-data technologies. Our recently published Big Data research highlights the role of various technology alternatives for managing data on a large scale. More than 80 percent of organizations utilize more than one technology to tackle their big-data challenges, but organizations lack maturity when incorporating these data sources.  Specifically, our research shows that business have not adapted many of their standard processes to deal with big data. We’ll follow up this research by looking at specific vendor capabilities and how they can help extend information management processes to support big data.

Data is increasing not only in volume but in velocity as well – the speed with which data is generated and communicated. Technological developments such as smart meters, RFID, sensors and embedded computing devices for environmental monitoring, surveillance and other purposes are creating demand for tools that can derive insights from huge, continuous streams of event data coming into systems in real time. Traditional database systems are geared to manage discrete sets of data for standard BI queries, but event streams from sources such as sensing devices typically are continuous and their analysis requires different kinds of tools that enable users to understand causality, patterns, time relationships and other complex factors. These requirements have led to innovations in complex event processing, event stream processing, event modeling, visualization and analytics. We’ll be exploring how organizations can capitalize on real-time data collection and analysis in our benchmark research on operational intelligence and complex event processing. We will also assess vendors and products in a Value Index to determine the value of vendor offerings in Operational Intelligence to harvest the events from these streams of data.

Information management continues to be a strategic business imperative. It can help organizations improve their understanding and use of enterprise information and to establish governance of it. To accomplish these aims they must manage the flow of information throughout the full life cycle of data and provide proper data stewardship to support the business while minimizing risk. We need to better use the information through a simpler means of being able to assemble and deploy it to those in business who might even want to receive it through mobile technologies. This is what we call information applications that can help in timely access to information and should be coupled with an information management discipline. Our research will deliver education and best practices that can help you understand how to reduce the costs, time and risk of delivering these capabilities to your organization.

It will be a big year for information management in the forms of technology but also the methods and processes for which to manage and utilize the full value of it within organizations. I look forward to connecting with all of you on LinkedIn or following me on Twitter.

Regards,

David Menninger – VP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,033 hits
%d bloggers like this: