You are currently browsing the tag archive for the ‘Cloudera’ tag.

Cloudera’s recent Hadoop World 2011 event confirmed that the world of big data is getting even bigger. As I wrote of last year’s event, Hadoop, the open source large-scale data processing technology, has gone mainstream. And while 75% of the audience attended this year for the first time and so may not have realized the breadth of Hadoop’s acceptance, statistics announced in the opening keynote show widespread use of it. Mike Olson, Cloudera CEO, reported that the event was sold out, with 1,400 attendees from 580 organizations and 27 countries. In independent confirmation, our benchmark research shows that 54% of organizations are either using or evaluating Hadoop for their big-data needs.

Before or during Hadoop World, several vendors made announcements that further reinforced the growth of the market. Cloudera announced it has raised an additional $40 million to expand its operations. Oracle, a sponsor of the event, introduced its Big Data Appliance, which includes Hadoop. NetApp announced a partnership with Cloudera to provide a preconfigured, appliance-like solution called NetApp Open Solution for Hadoop. Hortonworks, a Cloudera competitor, unveiled its own distribution of Hadoop called Hortonworks Data Platform.

Despite these announcements, one of the main impressions I took away from the event is that this emperor has no clothes. If you recall the story, the fact that the emperor had no clothes was not a metaphor of his authority but a question, literally and figuratively, about whom and what he had surrounded himself with. By many accounts, Hadoop is in a similar situation. It is a powerful but immature technology that has grown popular despite its shortcomings. In his keynote, Olson acknowledged this, saying that it’s not enough to provide a platform for Java developers. Doug Cutting, Cloudera’s architect and creator of Hadoop, described it as the kernel of a distributed operating system for big data. Not many end users are prepared to work directly with the kernel of an operating system.

Those users need the technology to be easier to handle and more broadly accessible. This issue shows up in our research finding that staffing and training are the two biggest obstacles to analyzing large-scale data sets with Hadoop. MapR, another Cloudera competitor with its own Hadoop distribution, is trying to capitalize on the need to overcome this obstacle by offering free training resources. And some vendors recognize the need to surround Hadoop with “better clothes.” Karmasphere and Datameer provide tools that make Hadoop easier to use in the analytics process. Informatica recently announced HParser, which makes it easier to parse the unstructured data often collected and analyzed with Hadoop. I spoke with other vendors at the event that are still in stealth mode, but we can expect continued development in the Hadoop ecosystem. Giving this trend momentum, venture capitalist Accel Partners announced a $100 million fund to invest in big-data companies.

This market is evolving with many moving parts. Hadoop is not one thing, but a collection of multiple projects. Hadoop’s distributed file system (HDFS) and MapReduce have been the cornerstones of Hadoop adoption; the majority of organizations use those two components, our research confirms. HBase, a columnar database built on HDFS, received a lot of attention at the event and was the subject of several presentations, including one about Facebook using HBase for real-time data access. Attendees were offered a free copy of HBase: The Definitive Guide, and several commented that perhaps the event should have been called HBase World.

So while the emperor has no clothes, there are plenty of tailors making suits. I expect increasing competition among different distributions of Hadoop and among existing and new tool vendors trying to make Hadoop easier to use. The advantage of all this interest in Hadoop is that the open source community is aware of many of the platform’s issues and is working to resolve them. The disadvantage is that with all the separate components and so many competitors that it will continue to be a confusing landscape until the market matures further.

Regards,

David Menninger – VP & Research Director

Cloudera is riding the wave of big data. I first learned about the company while working at Vertica, one of Cloudera’s partners. Customers that managed large amounts of structured relational data also needed to process large amounts of semistructured data such as the type found in web logs and application logs. The emerging channel of social media provided another source of data lacking the structure that would lend itself to analysis in a relational database. Other organizations needed to perform calculations and analyses that were difficult to express in SQL. Seeing this market Cloudera recognized earlier than others an opportunity to leverage the Apache Hadoop project; it has been offering the Cloudera Distribution for Hadoop (CDH) since early 2009.

I first wrote about Cloudera last year after attending Hadoop World and seeing firsthand significant interest in Hadoop. Much has happened at Cloudera since then and also in the broader big-data market. Cloudera recently made CDH version 3 generally available. (My colleague Mark Smith wrote about CDH3 when it was first announced.) Cloudera says it intends to release additional distributions annually, so we should expect another release early to middle 2012, although the recent entry of competitors into the Hadoop distribution market might prompt Cloudera to accelerate its releases.

In addition to the open source CDH releases, Cloudera offers an enterprise product that combines CDH with support and a set of management applications for authorization, provisioning, monitoring and resource management. The company has been working on version 3.5 of Cloudera Enterprise and proposes a release cycle for the enterprise product about twice as often as the annual releases of CDH. Version 3.5 includes real-time activity monitoring, an expanded file browser to show how files are used and their ownership, and extended authorization management and administration.

Perhaps as significant as the software developments, Cloudera has solidified its place in the market with key customer wins, additional funding, an expanded executive team and new partnerships. Last October, Cloudera announced $25 million in funding.  Its partnership with Informatica announced last fall has borne fruit as part of Informatica 9.1, which I covered in a previous post. I’ve also covered Jaspersoft Version 4 whose features include support for Hadoop. In my opinion, these partners are pursuing Cloudera rather than the other way around.

Of course, success often provokes competition. Cloudera’s first-mover advantage in the Hadoop market has attracted attention in the form of alternatives to Hadoop both direct, such as EMC offering its own distribution of Hadoop, and indirect, such as LexisNexis offering an open source version of its high-performance cluster computing system.

We recently completed research on the market requirements around big data, the benefits of adopting one of these alternatives and the obstacles as well. This research, the first of its kind, is the largest, most comprehensive study of issues related to the big-data market. We’ll be sharing some of our preliminary findings in a webinar next week hosted by two of the research sponsors. Time will tell which of these alternatives will succeed. As I’ve expressed in previous posts, I like competition, and you should, too, because it spurs vendors to offer better products at lower prices.

Regards,

David Menninger – VP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,006 hits
%d bloggers like this: