You are currently browsing the monthly archive for March 2016.

On Monday, March 21, Informatica, a vendor of information management software, announced Big Data Management version 10.1. My colleague Mark Smith covered the introduction of v. 10.0 late last year, along with Informatica’s expansion from data integration to broader data management. Informatica’s Big Data Management 10.1 release offers new capabilities, including for the hot topic of self-service data preparation for Hadoop, which Informatica is calling Intelligent Data Lake. The term “data lake” describes large collections of detailed data from across an organization, often stored in Hadoop. With this release Informatica seeks to add more enterprise capabilities to data lake implementations.

This is the latest step in Informatica’s  big data efforts. The company has been investing in Hadoop for five years, and I covered some of its early efforts. The Hadoop market has been evolving over that time, growing in popularity and maturing in terms of information management and data governance requirements. Our big data benchmark research has shown increases of more than 50 percent in the use of Hadoop, with our big data analytics research showing 37 percent of participants in production. Building on decades of experience in providing technology to integrate and manage data in data marts and data warehouses, Informatica has been extending these capabilities to the big data market and Hadoop specifically.

The Intelligent Data Lake capabilities are the most significant features of version 10.1. They include self-service data preparation, automation of some data integration tasks, and collaboration features to share information among those working with the data. The concept of self-service data preparation has become popular of late. Our big data analytics research shows that preparing data for analysis and reviewing it for quality and consistency are the two most time-consuming tasks, so making data preparation easier and faster would benefit most organizations.  Recognizing this market opportunity, several vendors are competing in this space; Informatica’s offering is called REV. With version 10.1 the Big Data Management product will have similar capabilities, including a familiar spreadsheet-style interface for working with and blending data as it is loaded into the target system. However, the REV capabilities available as part of Informatica’s cloud offering are separate from those in Big Data Management 10.1. They require separate licenses and there is no upgrade path or option as a result sharing work between the two environments is limited. Informatica faces two challenges with self-service: how well users view its self-service capabilities and user interface vs. those of their competitors and whether analysts and data scientists will be inclined to use Informatica’s products since they are mostly targeted at the data preparation process rather than the analytic process.

The collaborative capabilities of 10.1 should help organizations with their information management processes. Our most recent findings on collaboration come from our data and analytics in the cloud research, which shows that only 30 percent of participants are satisfied with their collaborative capabilities. The new release enables those who are working with the data to tag it with comments about what they found valuable or not, note issues with data quality and point others toward useful transformations they have performed. This type of information sharing can help reduce some of the time spent on data preparation. Ideally these collaboration capabilities could be surfaced all the way through the business intelligence and analytic process, but Informatica would have to do that through its technology partners since it does not offer products in those markets.

Version 10.1 includes other enhancements. The company has made additional investments in its use of Apache Spark both for performance purposes and for its machine-learning capabilities. I recently wrote about Spark and its rise in adoption. More transformations are implemented in Spark than in Hadoop’s MapReduce, which Informatica claims speeds up the processing by up to 500 percent. It also uses Spark to speed up the matching and linking processes in its master data management functions.

I should note that although Informatica is adopting these open source technologies, its product is not open source. Much of big data development is driven by the open source community, and that presents an obstacle to Informatica. Our next-generation predictive analytics research shows that Apache Hadoop is the most popular distribution, with 41 percent of organizations using or planning to use this distribution. Informatica itself does not provide a distribution of Hadoop but partners with vendors that do. Whether vr_Big_Data_Analytics_20-Hadoop_for_big_data_analyticsInformatica can win over a significant portion of the open source community remains a question. Whether it has to is another. In positioning release 10.1 the company describes the big data use cases as arising alongside conventional data warehouse and business intelligence use cases.

This release includes a “live data map” that monitors data landing in Hadoop (or other targets). The live data map infers the data format (such as social security numbers, dates and schemas) and creates a searchable index on the type of data it has catalogued; this enables organizations to easily identify, for instance, all the places where personally identifiable information (PII) is stored. They can use this information to ensure that the appropriate governance policies are applied to this data. Informatica has also enhanced its security capabilities in Big Data. Its Secure@Source product, which won an Innovation Award from Ventana Research last year , provides enterprise visibility and advanced analytics on sensitive data threats. The latest version adds support for Apache Hive tables and Salesforce data. Thus for applications that require these capabilities a more secure environment is available.

The product announcement was timed to coincide with the Strata Hadoop conference, a well-attended industry event that many vendors use to gain maximum visibility for such announcements. However, availability of the product release is planned for the second quarter of 2016. As an organization matures in its use of Hadoop, it will need to apply proper data management and governance practices.  With version 10.1 Informatica is one of the vendors to consider in meeting those needs.

Regards,

David Menninger

SVP & Research Director

I recently attended the SAS Analyst Summit in Steamboat Springs, Colo. (Twitter Hashtag #SASSB) The event offers an occasion for the company to discuss its direction and to assess its strengths and potential weaknesses. SAS is privately held, so customers and prospects cannot subject its performance to the same level of scrutiny as public companies, and thus events like this one provide a valuable source of additional information.

SAS has been a dominant player in the analytics marketplace for years, celebrating its 40th anniversary this year and reporting US$3.16 billion in 2015 revenue. Both of these are significant accomplishments in the software market. And while validation of vendors – including the company’s viability – often ranks as the least important of the seven product evaluation criteria in our benchmark research, SAS’s ranks as one of the most commonly used tools for predictive analytics: One-third (33%) of participants in our predictive analytics benchmark research use it.

The company provides a very broad technology stack including information management, business intelligence, and many types of analytics and visualization. In addition, SAS offers domain-specific applications built on top of these capabilities. A quick count on the products and solutions page on its website shows hundreds of entries, and SAS executives at the event asserted that the company will generate more than 100 releases this year. So one challenge for customers, as with any other large vendor, is navigating through this maze to find something that suits their needs. Also a vendor with a large portfolio is generally not as nimble as one with fewer products. On the other hand a large vendor can more easily manage the interdependencies between products for its customers. That is, if an organization licenses a variety of products from multiple vendors it often falls to the buyer to keep different versions from different vendors in synch.

At this year’s event, big data was much less prominent on the agenda than in the past. Over the last several years SAS has made significant investments in big data, supporting Hadoop and in-memory processing to create a scalable, high-performance infrastructure. This year it focused on end-user tools and prebuilt applications for working with data and analytics. In particular, SAS presenters identified three areas of focus in its technology investments: analytics, data management and visualization.

My colleague Mark Smith has written about the importance of visual discovery. SAS delivers visualization capabilities through its Visual Analytics product line, in which it continues to invest. A speaker claimed that more than 14,000 servers were licensed to run Visual Analytics as of 2015, up from 8,400 in 2014. SAS will be combining the capabilities of Visual Explorer and Visual Designer to create a more unified user experience. The company also is introducing visual data preparation features that enable users to explore and profile data as part of the data integration and transformation process.

No analysis should overlook SAS’s core competency of analytics. I was impressed with its demonstrations of automated evaluation and selection of different predictive analytics algorithms as part of its visualization capabilities. Visual Analytics offers a modern, intuitive user interface for analyzing big data set, but it has only limited collaboration capabilities. Our big data analytics benchmark research finds more than three in four respondents (78%) consider collaboration as important or very important. SAS will need to make further investments to support collaboration.

SAS continues to advance another product, Visual Investigator, which it plans to bring to market more broadly in 2016. Announced last year as part of the Security Intelligence products, it is targeted at what SAS calls the “intelligence analyst,” a role between business analyst and data scientist. This tool has great potential as a standalone product as it combines critical components of tasks, activities, cases, and taking action that have broad appeal across all analytic roles and beyond fraud and security investigations.

At the event, several analysts questioned SAS executives about the impact of open source systems on the company. This was a hot topic generating much discussion. Executives acknowledged that the company has lost ground to open source systems in the educational market and in response has reinvigorated its effortsvr_NG_Predictive_Analytics_16_why_users_dont_produce_predictive_analyses there. The SAS University Edition is available for download and on Amazon Web Services. It is also available in a cloud-based offering called SAS On Demand for Academics. Dozens of universities that offer masters of analytics programs are working with SAS to help address the shortage of skilled analytics resources. Our Next Generation Predictive Analytics research shows that lack of skills training and knowledge of the mathematics involved in predictive analytics are significant obstacles to users producing their own analyses, cited by 79% and 66% of participants respectively. It is wise for SAS to invest in supporting these programs. If your organization wants to hire or train additional resources, these programs may be a valuable resource.

On the technology front, executives pointed out that SAS has embraced open source systems, with support for Linux, Hadoop and the Python and Lua programming languages. However, it does not plan to support R, which is used by more than half (58%) of participants in our predictive analytics research . The company also plans to introduce a set of open application programming interfaces (APIs) to encourage more developers to work with SAS and create a marketplace of third-party products. The key will be whether the collection of these efforts makes a significant impact on the developer community, which is where open source tools often gain their foothold in organizations. At a minimum I expect SAS will need to offer a “freemium” version of products if it really wants to win over the development community.

This event provides a valuable window into SAS’s performance and strategy. The company has proven its staying power and ability to be a relatively fast follower of industry trends to remain competitive in the constantly evolving business intelligence and analytics landscape. For organizations that consider expanding their use of analytics, I recommend placing SAS on the list of vendors they evaluate.

Regards,

David Menninger

SVP & Research Director

 

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,006 hits
%d bloggers like this: