You are currently browsing the tag archive for the ‘Information Management’ tag.

On Monday, March 21, Informatica, a vendor of information management software, announced Big Data Management version 10.1. My colleague Mark Smith covered the introduction of v. 10.0 late last year, along with Informatica’s expansion from data integration to broader data management. Informatica’s Big Data Management 10.1 release offers new capabilities, including for the hot topic of self-service data preparation for Hadoop, which Informatica is calling Intelligent Data Lake. The term “data lake” describes large collections of detailed data from across an organization, often stored in Hadoop. With this release Informatica seeks to add more enterprise capabilities to data lake implementations.

This is the latest step in Informatica’s  big data efforts. The company has been investing in Hadoop for five years, and I covered some of its early efforts. The Hadoop market has been evolving over that time, growing in popularity and maturing in terms of information management and data governance requirements. Our big data benchmark research has shown increases of more than 50 percent in the use of Hadoop, with our big data analytics research showing 37 percent of participants in production. Building on decades of experience in providing technology to integrate and manage data in data marts and data warehouses, Informatica has been extending these capabilities to the big data market and Hadoop specifically.

The Intelligent Data Lake capabilities are the most significant features of version 10.1. They include self-service data preparation, automation of some data integration tasks, and collaboration features to share information among those working with the data. The concept of self-service data preparation has become popular of late. Our big data analytics research shows that preparing data for analysis and reviewing it for quality and consistency are the two most time-consuming tasks, so making data preparation easier and faster would benefit most organizations.  Recognizing this market opportunity, several vendors are competing in this space; Informatica’s offering is called REV. With version 10.1 the Big Data Management product will have similar capabilities, including a familiar spreadsheet-style interface for working with and blending data as it is loaded into the target system. However, the REV capabilities available as part of Informatica’s cloud offering are separate from those in Big Data Management 10.1. They require separate licenses and there is no upgrade path or option as a result sharing work between the two environments is limited. Informatica faces two challenges with self-service: how well users view its self-service capabilities and user interface vs. those of their competitors and whether analysts and data scientists will be inclined to use Informatica’s products since they are mostly targeted at the data preparation process rather than the analytic process.

The collaborative capabilities of 10.1 should help organizations with their information management processes. Our most recent findings on collaboration come from our data and analytics in the cloud research, which shows that only 30 percent of participants are satisfied with their collaborative capabilities. The new release enables those who are working with the data to tag it with comments about what they found valuable or not, note issues with data quality and point others toward useful transformations they have performed. This type of information sharing can help reduce some of the time spent on data preparation. Ideally these collaboration capabilities could be surfaced all the way through the business intelligence and analytic process, but Informatica would have to do that through its technology partners since it does not offer products in those markets.

Version 10.1 includes other enhancements. The company has made additional investments in its use of Apache Spark both for performance purposes and for its machine-learning capabilities. I recently wrote about Spark and its rise in adoption. More transformations are implemented in Spark than in Hadoop’s MapReduce, which Informatica claims speeds up the processing by up to 500 percent. It also uses Spark to speed up the matching and linking processes in its master data management functions.

I should note that although Informatica is adopting these open source technologies, its product is not open source. Much of big data development is driven by the open source community, and that presents an obstacle to Informatica. Our next-generation predictive analytics research shows that Apache Hadoop is the most popular distribution, with 41 percent of organizations using or planning to use this distribution. Informatica itself does not provide a distribution of Hadoop but partners with vendors that do. Whether vr_Big_Data_Analytics_20-Hadoop_for_big_data_analyticsInformatica can win over a significant portion of the open source community remains a question. Whether it has to is another. In positioning release 10.1 the company describes the big data use cases as arising alongside conventional data warehouse and business intelligence use cases.

This release includes a “live data map” that monitors data landing in Hadoop (or other targets). The live data map infers the data format (such as social security numbers, dates and schemas) and creates a searchable index on the type of data it has catalogued; this enables organizations to easily identify, for instance, all the places where personally identifiable information (PII) is stored. They can use this information to ensure that the appropriate governance policies are applied to this data. Informatica has also enhanced its security capabilities in Big Data. Its Secure@Source product, which won an Innovation Award from Ventana Research last year , provides enterprise visibility and advanced analytics on sensitive data threats. The latest version adds support for Apache Hive tables and Salesforce data. Thus for applications that require these capabilities a more secure environment is available.

The product announcement was timed to coincide with the Strata Hadoop conference, a well-attended industry event that many vendors use to gain maximum visibility for such announcements. However, availability of the product release is planned for the second quarter of 2016. As an organization matures in its use of Hadoop, it will need to apply proper data management and governance practices.  With version 10.1 Informatica is one of the vendors to consider in meeting those needs.

Regards,

David Menninger

SVP & Research Director

The big data market continues to expand and enable new types of analyses, new business models and new revenues streams for organizations that implement these capabilities. Following our previous research into big data and information optimization, we’ll investigate the technology trends affecting both of these domains as part of our 2016 research agenda.

A key tool for deriving value from big data is in-memory computing. As data is generated, organizations can use the speed of in-memory computing to accelerate the analytics on that data. Nearly-two thirds (65%) of participants in our big data analytics benchmark research identified real-time analytics as an important aspect of in-memory computing. Real-time analytics enables organizations to respond to events quickly, for instance, minimizing or avoiding the cost of downtime in manufacturing processes or rerouting deliveries that are in transit to cover delays in other shipments to preferred customers. Several big data vendors offer in-memory computing in their platforms.

Predictive analytics and machine learning also contribute to information optimization. These analytic techniques can automate some decision-making to improve and accelerate business processes that deal with large amounts of data. Our new big data benchmark research will investigate the use of predictive analytics with big data, among other topics. In combination with our upcoming data preparation benchmark research, we’ll explore the unification of big data technologies and the impact on resources and tools needed to successfully use big data. In our previous research, three-quarters of participants said they are using business intelligence tools to work with big data analytics. We will look for similar unification of other technologies with big data.

vr_Big_Data_Analytics_03_technology_for_big_data_analyticsThe emergence of the Internet of Things (IoT) – an extension of digital connectivity to devices and sensors in homes, businesses, vehicles and potentially almost anywhere – creates additional volumes of data and brings pressure for data in motion for both analytics and operations. That is, the data from these devices is generated in such volumes and with such frequency that specialized technologies have emerged to tackle these challenges. We’ll explore in depth the myriad issues arising from this explosion of connectivity in our benchmark research on the Internet of Things and Operational Intelligence this year.

Another key trend we will explore is the use of data preparation and information management tools to simplify accessibility to data. Data preparation is a key step in this process, yet our data and analytics in the cloud benchmark research reveals that data preparation requires too much time: More than half (55%) of participants said they spend the most time in their analytic process preparing data for analysis. Virtualizing data access can accelerate access to data and enables data exploration with less investment than is required to consolidate data into a single data repository. We will be tracking adoption of cloud-based and virtualized integration capabilities and increasing use of Hadoop as a data source and store for processing of big data. In addition, our research will examine the role of search, natural language and text processing.

We suggest organizations develop their big data competencies for continuous analytics – collecting and analyzing data as it is generated. It should start with establishing appropriate data preparation processes for information responsiveness. Data models and analyses should support machine learning and cognitive computing to automate portions of the analytic process. Much of this data will have to be processed in real time as it is being generated. All of these advances will need advanced methods for big data governance and master data management. We look forward to reporting on developments in these areas throughout 2016 in our Big Data and Information Optimization Research Agenda.

Regards,

David Menninger

SVP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,033 hits
%d bloggers like this: