You are currently browsing the tag archive for the ‘Machine data’ tag.

I recently attended .conf2016, Splunk’s seventh annual user conference. Splunk created the market for analyzing machine data (shorthand for machine-generated data), which consists of log files and event data fromvr_big_data_analytics_04_types_of_big_data_for_analytics_updated various types of systems and devices. Our big data analytics benchmark research shows that these are two of the most common sources of big data that organizations analyze. This market has proven to be fertile ground for Splunk, growing steadily with revenues more than doubling over the previous two fiscal years. Machine data is also the backbone for the Internet of Things (IoT) and operational intelligence, which form the basis of forthcoming benchmark research from Ventana Research.

At the event, Splunk announced general availability of Splunk Cloud and Splunk Enterprise 6.5. The company also announced new versions of Splunk IT Service Intelligence, Splunk Enterprise Security and Splunk User Behavior Analytics. These new versions incorporate machine learning capabilities to help organizations analyze the massive volumes of machine data they collect with more advanced analytics and in a more automated manner. Machine learning has become a hot topic lately; it was also a popular subject at Strata+Hadoop World, as I wrote recently.

The machine learning capabilities, which arose in part from Splunk’s July 2015 acquisition of Caspida, have been added to Splunk Cloud and Splunk Enterprise 6.5. Machine learning is a method used to develop predictive analytics without explicitly programming the models. In effect the algorithms are designed to sift through the data, learn from it and make predictions. With Version 6.5 Splunk also has simplified its data preparation capabilities and enhanced its user interface to appeal to more types of users. The company also offers tighter integration with Hadoop in this version.  Storing historical data in Hadoop can help lower costs, and the Hadoop data can be combined with data in Splunk Enterprise using the Splunk query capability for a single unified interface.

Splunk IT Service Intelligence (ITSI), an application built on the Splunk platform, provides a view of how critical IT services are operating as well as an environment in which to investigate and triage incidents when they occur. The latest release of ITSI, 2.4, includes machine learning capabilities to perform anomaly detection, identifying unusual system activity to help prevent outages and service degradations. The system can learn what the pattern of normal operations looks like and then establish thresholds for alerts that adapt to cyclical changes in usage. Adaptive alerts help reduce “alert fatigue” when so many alerts are issued that they overwhelm the recipients.

Splunk Enterprise Security (ES), a security information and event management (SIEM) application, provides real-time monitoring of security threats and an environment to support incident response teams. Splunk ES 4.5, the latest release, provides a similar adaptive alerting feature based on machine learning as described above. ES 4.5 now includes the Glass Tables feature that has been available in ITSI, which allows users to create custom visualizations and KPIs. Splunk User Behavior Analytics (UBA) complements ES by analyzing longer periods of history to create a profile of normal user behavior and comparing it with peers to provide more advanced detection of security threats. UBA 3.0 incorporates more than 40 machine learning models, which cover a combination of streaming and batch analytic scenarios. Splunk in 2015 received the Technology Innovation Award for CIO for its innovation in advancing cybersecurity through these products.

Splunk has followed a unique path. While a pioneer in the big data market, it built its products on a proprietary big data architecture rather than open source technologies as others did. In recent releases, however, it has broadened its support for Hadoop. Splunk focused on one subset of big data – machine data – and based much of its user interface around search. Rather than expand into the horizontal business intelligence market the company has chosen to tackle the IT service market and the SIEM market. This focus appears to have been successful so far. It’s hard to argue with its success. If you are looking for a way to manage and analyze the machine data in your organization, including IT service applications or enterprise security, I recommend you consider the offerings from Splunk.


David Menninger

SVP & Research Director

Follow Me on Twitter @dmenningerVR and Connect with me on LinkedIn.

Splunk may be one of the biggest software companies you’ve never heard of. I’ve been following the seven-year-old company for over six months now and recently attended its second annual user conference. Splunk focuses on analyzing large volumes of machine-generated data in underlying applications and systems, which includes application and system logs, network traffic, sensor data, click streams and other loosely structured information sources. Many of these “big data” sources are the same sources analyzed with Hadoop, according to our recently published benchmark research. However, Splunk takes a different approach that focuses on performing simple analyses on this data in real time rather than the batch-based advanced analytics we see as the most common use for Hadoop.

Although privately held, Splunk operates much like a public company and appears to be grooming itself for an initial public offering. In its fiscal year ended January 31, 2011, Splunk reported $66 million in revenue and has announced that its goals for FY 2012 include generating $100 million in revenue. With 68% and 70% growth in its first two quarters this year, Splunk appears to be on track to meet this goal. CEO Godfrey Sullivan, formerly CEO of Hyperion, has a successful track record in the business intelligence software space. All these indications suggest a promising future for the company. Data originates from a variety of sources in ever increasing volumes, and organizations are trying to figure out how they can maximize the value of this data. Splunk has rapidly grown based on the simplicity of the tool for IT professionals to adopt and utilize against machine or IT specific data from an individual or department that according to our IT Analytics benchmark finds plenty of demand in IT.

As stated above, Splunk focuses on a specific segment of the big-data market: machine-generated data. This type of data originates constantly from many sources throughout an organization and in large quantities. The other common characteristic of machine-generated data is that generally it is less structured than data in typical relational databases. Often the information is captured as logs consisting of text files containing various record lengths and record structures. To effectively utilize this loosely structured information in real time, two challenges must be overcome: loading the data quickly and easily navigating through and analyzing the information once it is loaded. 

Splunk tackles the first challenge by loading the information in its raw form. No preprocessing is necessary, therefore no delay is introduced and no data is “lost.” Retaining all the raw data has business value as well. If you later decide that you want to investigate some new piece of information that previously you didn’t think was important, it will be available for analysis.

A search-based mechanism provides the solution to the second challenge. Our information applications research shows the importance of search, which ranked third on the list of very important analysis capabilities overall, and for end users specifically it topped the list of very important capabilities (46%), ahead of navigating to and retrieving information. Search based access to analytics has been a large driver in growth and was highlighted by my colleague in 2009. Search overcomes the issues created by the lack of “structure” in the machine-generated data. In reality the data has plenty of structure – users search for strings representing occurrences of certain types of events. Splunk supplements the query mechanism with analytical functions that can be used to create aggregates, time-period comparisons and other common analyses. In addition, queries can be saved for reuse and as the basis of reports, dashboards and alerts. I heard anecdotal proof of the value of search at the Splunk user conference from two undergraduate students who, as part of their summer internship, had learned the Splunk query techniques quickly and implemented reports and analyses for monitoring the systems of a major financial services software company.

Architecturally, Splunk employs massively parallel processing to spread the data and processing across a number of individual servers. At query time, a proprietary MapReduce mechanism – one not based on Hadoop – gathers the data from the individual nodes to satisfy the user’s request. Users do not need to know about the MapReduce mechanism. The translation of the query to the appropriate execution strategy is done automatically. However, as with any distributed data system, some knowledge of how the data gets distributed across the nodes can be helpful in identifying performance bottlenecks and tuning certain slowly running queries.

The currently released version, Splunk 4.2 was introduced earlier in 2011 and includes real-time alerting on streams of data. It also includes a new agent-based data collection mechanism, called a universal forwarder, that makes the task easier and provides more reliability when collecting data from multiple endpoints or devices. Splunk separates the workload between indexers that perform the data loading and search heads that execute the queries. Version 4.2 introduced search-head pooling for load balancing so searches can be directed to anyone of the search heads; it also provides high availability among the search processes.

At the conference, Splunk introduced version 4.3 and made the beta version available to registered users. One of the more popular demonstrations was Splunk 4.3 running as a non-Flash application on the iPad. The company also made a number of announcements of specific applications and extensions of the product. Splunk Storm provides visibility and operational analytics of cloud-based applications. Splunk App for Citrix XenDesktop and Splunk App for VMWare provide visibility into virtualized and private cloud environments. The company also introduced a software development kit (SDK) for the Python programming language, which is open source and available at github.

The Splunk product is not perfect, of course. Continued investment in the user interface is needed to make it easier to use. Currently users have to learn the Splunk syntax – I was introduced to those internals to show that this is easy – and a graphical query interface also would make the product more widely usable. When I probed about high availability, it became clear that you can use the Splunk tools to load dual systems to have a standby system in case of failure, but that’s not done automatically. But the company representatives were open about its shortcomings with both me and its customers, which was refreshing.

Nearly every organization has some form of machine data. Splunk says that more than 2,900 enterprises have found a reason to purchase its products. The company’s mission is now to raise its visibility and broaden its applicability. Splunk provides a free, limited-capability version of its product so you can try it for yourself and see if it applies to your needs.


David Menninger – VP & Research Director

Follow on

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,527 hits
%d bloggers like this: