You are currently browsing the tag archive for the ‘Enterprise Data Strategy’ tag.

Tableau Software officially released Version 6 of its product this week. Tableau approaches business intelligence from the end user’s perspective, focusing primarily on delivering tools that allow people to easily interact with data and visualize it.  With this release, Tableau has advanced its in-memory processing capabilities significantly. Fundamentally Tableau 6 shifts from the intelligent caching scheme used in prior versions to a columnar, in-memory data architecture in order to increase performance and scalability.

Tableau provides an interesting twist in its implementation of in-memory capabilities, combining in-memory data with data stored on the disk. One of the big knocks against in-memory architectures has been the limitation imposed by the physical memory on the machine. In some cases products were known to crash if you exceeded the memory. In other cases the system didn’t crash, but it performed so much slower once you exceeded the memory that it almost appeared to have crashed.

The advent of 64-bit operating systems dramatically increased the theoretical limitations that existed in 32-bit operating systems. Servers can now be configured with significant amounts of memory at prices that are within reason, but putting your entire warehouse or large-scale data set entirely in-memory on a single machine is still a stretch for most organizations. With Tableau 6 a portion of the data can be loaded into memory and the remainder left on disk. Coupled with the feature that allows links to data in an RDBMS it provides considerable flexibility. Data can be loaded into memory, put on disk or linked to one of many supported databases. As the user interacts with data, it will be retrieved from the appropriate location. Tableau 6 also includes assistance in managing and optimizing the dividing line between data in-memory and on-disk, based on usage patterns.

However, one of the places where this new architecture comes up short is in the data refreshment process. In the current Tableau 6, users must manually request a refresh of the data that is currently in-memory. Ideally there should be an optional automated way to keep the in-memory data up to date. The other thing I would like to see in Tableau 6 and other in-memory products is better read/write facilities. Although this version includes better “table calcs,” which can be used to display some derived data and perform some limited what-if capabilities, there is no write-back capability that would let you use Tableau as a planning tool and record the changes you explore.

Tableau 6 includes a number of other features beyond the in-memory capabilities. It now supports a form of data federation in which data from multiple sources can be combined in a single analysis. The data can be joined on the fly in the Tableau server. Tableau refers to this as “data blending.” Users can also create hierarchies on the fly simply by dragging and dropping dimensions. And there are some new interactive graphing features such as dual-axis graphs with different levels of detail on each axis and the ability to exclude items from one axis but not the other, which can be helpful to correct for outliers such as the impact of one big sale on profitability or average deal size.

As well this release supports several new data sources including the Open Data Protocol (OData), the DataMarket section of Windows Azure Marketplace and Aster Data who my colleague recently assessed.

Version 6 also includes some IT-oriented enhancements. As Tableau has grown, its software has been deployed in ever-larger installations, which places a focus on its administrative facilities. The new release includes improved management for large numbers of users with grouping and assigning privileges and specific selection and edit options. It also includes a new project view of objects created and managed within Tableau. All of these help bring it forward to departmental and enterprise class analytics technology.

Overall, the release includes features that should be well received by both end users and IT. It shares an end user analytics category with QlikView 10, which I recently assessed, and Tibco Spotfire. I’ll be anxious to see if the company can push the in-memory capabilities even further in future releases. It is clear that Tableau brings another viable option to the category of analytics for analysts with new in-memory computing and blending of data from across data sources.

Let me know your thoughts or come and collaborate with me on  Facebook, LinkedIn and  Twitter .

Regards,

David Menninger – VP & Research Director

Interest in and development of in-memory technologies have increased over the last few years, driven in part by widespread availability of affordable 64-bit hardware and operating systems and the performance advantages in-memory operations provide over disk-based operations. Some software vendors, such as SAP with its High-Performance Analytic Appliance (HANA) project has been advancing with momentum, have even suggested that we can put our entire analytic systems in memory.

I hope it will be helpful to take a look at what an “in-memory” system is, what it is good for and what some of the concerns about it are. First of all, nearly all systems involve some combination of memory and disk operations, but the roles each of these plays may differ. The fundamental value proposition relates to the greater speed of memory-based operations vs. disk-based input/output (I/O) operations. It is easy to understand that computer operations in memory can be significantly faster than any operation involving I/O. Many types of system performance have been enhanced by leveraging memory in the form of caches. If information can be retrieved from a cache rather than the disk, the operation will complete more quickly.

What types of applications can benefit from in-memory technology? Very fast, high-volume transaction processing can be accomplished using one type of in-memory technology. Examples include IBM solidDB, Oracle TimesTen, Membase and VoltDB. Complex event processing (CEP) is another type of in-memory system. Examples of CEP include IBM, Progress Software’s Apama, Streambase, Sybase Aleri recently bought by SAP and also Vitria. Other types of analytics can be performed in-memory, including more conventional query and analysis of historical data. Beyond SAP is QlikView who I recently assessed, Tibco Spotfire and now Tableau. All of these systems deal with historical data. Another category of in-memory systems involves forward-looking calculations, models and simulations. Examples include IBM Cognos TM1 and Quantrix who my colleague recently covered (See: “Quantrix Gets Pushy with Plans”).

Over the years database performance has been greatly improved by advances in caching schemes. A logical extension of caching might be to put the entire database in memory and eliminate any disk-based operations. Well, it’s not quite that simple. There are some complexities, such as recoverability, that must be dealt with when the system is entirely in memory. I suspect you’ve heard the term “ACID compliant”; the “D” stands for durability. It represents the notion that a transaction once committed will be durable or permanently recorded. Without creating a copy of the transaction somewhere other than in the memory of the affected system, you can’t recover the transaction and therefore cannot provide durability. Even in analytical systems the notion of durability is important because you need to be able to ensure that the data was loaded properly into the system.

I’ve seen three schemes for dealing with the durability issue. Each has advantages and challenges:
1) Write data to disk as well as putting it in memory. The challenge here is whether you can write to the disk fast enough to keep up with the data that is being loaded into memory.
2) Put the data in memory on two different machines. The risk here is if both machines go down, you lose the data.
3) Use a combination of #1 and #2 above. Putting data in-memory on two machines provides some level of protection that allows time for a background process or asynchronous process to write data out to disk. In this case you need to understand what scheme a vendor is using and whether it meets your service level agreements.

In some streaming applications the history and recoverability are left to other systems (such as the system of record) and the operations on the streaming data are allowed to operate “without a net,” so to speak. This method assumes that if the system goes down, you can live without the data that was in transit – either because it will be recovered elsewhere or because it wasn’t important enough to keep. An example might be stock-trading data being analyzed with an in-memory complex event processing system. If the CEP system crashes, the quotes being analyzed could be recovered from the exchange that generated them.

Another issue is that memory is much more expensive than spinning disk. In considering the enormous and ever-increasing volumes of data produced and consumed in the world of analytics, cost could be a significant obstacle. In the future, cost structures may change, but for the near term, memory still exacts a premium relative to the same quantity of disk storage. As a result memory-based systems need to be as efficient as possible and fit as much data as possible into memory. Toward this end, many in-memory analytic systems use columnar representation because it offers a compact representation of the data. Thus the key issue here as you compare vendors is to understand how much memory each requires to represent your data. Remember to take into consideration temp space or working space for each user.

I think the technology market understands accelerating DBMS and CEP operations as we found in our benchmark research on Operational Intelligence and Complex Event Processing, but I doubt that it fully understands how in-memory technology can transform calculations, modeling and simulations over large amounts of data. Today’s CPUs are very good (and fast) at performing calculations, as millions of users know from their work with spreadsheets. An overwhelming majority (84%) of organizations in our benchmark research on BI and performance management said it is important or very important to add planning and forecasting to their efforts, and these activities are calculation-intensive.

Applying spreadsheet-style calculations to large amounts of data is a challenge. Often you run out of memory or performance is so poor that it is unusable. Relational databases are another obstacle. Performing spreadsheet-type calculations on data in relational databases is difficult because each row in an RDBMS is independent from every other row. In performing simple interrow calculations – for example, computing next year’s projected sales as a function of this year’s sales – you could be accessing two entirely different portions of the database and therefore different portions of the disk. Multiply that simple example by hundreds or thousands of formulas needed to model your business operations and you can see how you might have to have the whole database in-memory to get reasonable performance. Assuming you can fit the data in-memory, these types of seemingly random calculation dependences can be handled easily and efficiently. Remember, RAM stands for random-access memory.

The next consideration is that most in-memory databases used for analytics do not scale across machines. SAP’s HANA may change some of that. Tibco’s ActiveSpaces has promise as well. I’ll be interested to see who tackles this challenge first, but I don’t believe the internode communications infrastructure exists yet to make random access of data across nodes feasible. So for the time being, calculation models will most likely need to be confined to data located on a single machine to deliver reasonable performance. It’s clear that in-memory databases can provide needed benefits, but they will have to handle these challenges before wide adoption becomes likely.

Let me know your thoughts or come and collaborate with me on  Facebook, LinkedIn and  Twitter .

Regards,

David Menninger – VP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 45,986 hits
%d bloggers like this: