You are currently browsing the tag archive for the ‘Data Preparation’ tag.

Data preparation is critical to the effectiveness of both operational and analytic business processes. Operational processes today are fed by streams of constantly generated data. Our data and analytics in the cloud benchmark research shows that more than half (55%) vr_dac_23_time_spent_in_analytics_updatedof organizations spend the most time in their analytic processes preparing data for analysis – a situation that reduces their productivity. Data now comes from more sources than ever, at a faster pace and in a dizzying array of formats; it often contains inconsistencies in both structure and content.

In response to these changing information conditions, data preparation technology is evolving. Big data, data science, streaming data and self-service all are impacting the way organizations collect and prepare data. Data sources used in analytic processes now include cloud-based data and external data. Many data sources now include large amounts of unstructured data, in contrast to just a few years ago when most organizations focused primarily on structured data. Our big data analytics benchmark research shows that nearly half (49%) include unstructured content such as documents or Web pages in their analyses.

The ways in which data is stored in organizations are changing as well. Historically, data was extracted, transformed and loaded, and only then made available to end users through data warehouses or data marts. Now data warehouses are being supplemented with, or in some cases replaced by, data lakes, which I have written about. As a result, the data preparation process may involve not just loading raw information into a data lake, but also retrieving and refining information from it.

The advent of big data technologies such as Hadoop and NoSQL databases intensifies the need to apply data science techniques to make sense of these volumes of information. In this case querying and reporting over such large amounts of information are both inefficient and ineffective analytical techniques. And using data science means addressing additional data preparation requirements such as normalizing, sampling, binning and dealing with missing or outlying values. For example, in our next-generation predictive analytics benchmark research 83 percent of organizations reported using sampling in preparing their analyses. Data scientists also frequently use sandboxes – copies of the data that can be manipulated without impacting operational processes or production data sources. Managing sandboxes adds yet another challenge to the data preparation process.

Data governance is always a challenge; in this new world it has if anything grown even more difficult as the volume and variety of data grow. At the moment most big data technologies trail their relational database counterparts in providing data governance capabilities. The developers of data preparation processes must adapt them to these new environments, supplementing them with processes that support governance and compliance of personally identifiable information (PII), payment card information (PCI), protected health information (PHI) and other standards for the handling of sensitive, restricted data.

In the emerging self-service approach to data preparation, three separate user personas typically are employed. Operational teams need to derive useful information from data as soon as it is generated to complete business transactions and keep operations flowing smoothly. Analysts need access to relevant information to guide better decision-making. And the IT organization is often called upon to support either or both of these roles when the complexities of data access and preparation exceed the skills of those in the lines of business. While IT departments probably welcome the opportunity to enable end users to perform more self-service tasks, they cannot do so to the extent that it ignores enterprise requirements. Nonetheless, the trend toward deploying tools that support self-service data preparation is growing. These two trends can lead to conflict for organizations that want to derive maximum business value from their data as quickly as possible while still maintaining appropriate data governance, security and consistency.

To help understand how organizations are tackling these changes, Ventana Research is conducting benchmark research on data preparation. This research will identify existing and planned approaches and related technologies, best practices for implementing them and market trends in data preparation. It will assess the current challenges associated with innovations in data preparation, including self-service capabilities and architectures that support big data environments. The research will assess the extent to which tools and processes for data preparation support superior performance and determine how organizations balance the demand for self-service capabilities with enterprise requirements for data governance and repeatability. It will uncover ways in which data preparation and supporting technologies are being used to enhance operational and analytic processes.

This research also will provide new insights into the changes now occurring in business and IT functions as organizations seek to capitalize on data preparation to gain competitive advantage and help with regulatory compliance and risk management and governance processes. The research will investigate how organizations are implementing data preparation tools to support all types of operational and business processes including operational intelligence, business intelligence and data science.

Data is an essential component of every aspect of business, and organizations that use it well are likely to gain advantages over competitors that do not. Watch our community for updates. We expect the research to reveal impactful insights that will help business and IT. When it is complete, we’ll share education and best practices about how organizations can tackle these challenges and opportunities.


David Menninger

SVP & Research Director

Follow Me on Twitter @dmenningerVR and Connect with me on LinkedIn.

Qlik helped pioneer the visual discovery market with its QlikView product. In some respects, Qlik and its competitors also spawned the self-service trend rippling through the analytics market today. Their aim was to enable business users to perform analytics for themselves rather than building a product with the perfect set of features for IT. After establishing success with end users the company began to address more of the concerns of IT, eventually creating a robust enterprise-grade analytics platform. This approach has worked for Qlik, driving growth that led to an initialVR_AnalyticsandBI_VI_HotVendor_2015 public offering in 2010. The company now generates more than half a billion dollars in revenue annually, making it one of the largest independent analytics vendors. Of which based on their company and products was rated a Hot Vendor in our 2015 Value Index on Analytics and Business Intelligence and one of the highest ranked in usability.

However, as Qlik was experiencing that dramatic growth, the analytics market was changing from a Windows-based, desktop platform to a mobile, cloud-based one. As a result of these market shifts, a couple years ago the company introduced the Qlik Sense product line to offer a modern, cloud-based platform for its analytics. Thus the company embraced a two-product strategy consisting of QlikView and Qlik Sense, which my colleague Mark Smith wrote about earlier this year. When Qlik introduced this split in product lines, some customers had questions about whether it would continue to invest in QlikView. Any questions I had about both parts of its product strategy were answered a few weeks ago at Qonnections, its annual user conference – both by company executives and in my conversations with customers.

Qlik has continued its support of and investment in the QlikView product line and will provide annual updates to the product, which is now on version 12. Customers who are happy with their QlikView implementations – and I spoke with several at the conference – can continue to use the product and can expect enhancements, albeit less frequently than updates for the Qlik Sense product line. However, since QlikView and Qlik Sense share the same QIX analytics engine, customers can begin to make the transition to Qlik Sense without giving up their QlikView applications.

The company also introduced Qlik Sense 3.0, which is now generally available. It includes new features for self-service data preparation, enhanced search capabilities and an expanded set of application programming interfaces (APIs). The new data preparation features follow an industry trend toward vr_DAC_23_time_spent_in_analyticsproviding more self-service capabilities for end users. Data preparation remains a challenge for many organizations. Our benchmark research on data and analytics in the cloud shows that this activity is where the majority (55%) of organizations spend the most amount of time in their analytics process. Qlik has done a nice job here. Its user interface is intuitive, using a “connected bubbles” metaphor. Data sets show up as bubbles and can be joined graphically to other data sets or bubbles. The software automatically detects the join field based on profiling of the data involved. Other products have used drag-and-drop techniques with an automatic suggestion of join fields, but Qlik has made the visuals more appealing and easier to work with. Date fields and geographic fields are also detected during the profiling process, automating more of the steps involved in working with these types of fields. The new version also includes a graphical interface for defining derived or calculated fields.

The search capabilities, historically a strength for Qlik, have been extended to include metadata and charts. Users can search for a particular measure such as profit by region and see thumbnails of the charts and graphs that reference this measure. Qlik refers to this feature as “visual search.” Seeing the thumbnails provides more context and should make it easier to find the appropriate measure or visualization quickly.

Qlik Sense 3 has bidirectional language support as well as more international versions. With this release the company has officially added support for Korean, Polish, traditional Chinese and Turkish in addition to 11 other languages already supported.

Outside of the Qlik Sense product improvements, the company also supports more connectors to additional data sources as a result of its acquisition of Industrial CodeBox announced at Qonnections. Users now have direct connectivity to Twitter, Facebook, Google, Microsoft Dynamics CRM and Sugar CRM data. In addition to connectors, Qlik DataMarket provides access to a variety of free and subscription-based external data sources that can be used as part of an organization’s analytics. The new data sources include a financial services package with data from 35 major stock exchanges and indices including quote data and financial statement data from publicly traded companies.

The company also continues to invest in cloud-based analytics. Our research shows that two-thirds (67%)  of organizations use cloud-based analytics today or expect to within 12 months. Later this year Qlik will extend its cloud offerings to include Qlik Sense Cloud Business. Previously the company had introduced Qlik Sense Cloud Basic, a free version for individual usage, and Qlik Sense Cloud Plus, which allows sharing of analyses with up to five individuals. The Business version will provide departmental and small business support with sharing of analyses among selected groups or individuals within an organization.

On an entirely different front, in early June the company announced that it has agreed to be acquired by private equity firm Thoma Bravo. This is the latest in a spate of public technology companies being acquired by private equity firms. Tibco, Informatica and EMC are at various stages of going down a similar route. The transition to cloud-based products may be part of what is driving Qlik to go private. Cloud products are generally delivered on a subscription basis, which produces less revenue recognition up front, and it is difficult for a public company to meet the market’s revenue and profitability expectations as it transitions from large enterprise license deals with lots of upfront revenue.

Due to standard regulatory restrictions, the companies can’t say much about the acquisition and subsequent plans other than that the deal is expected to close in the third quarter of 2016. These restrictions contrast with Qlik’s public disclosure of its product roadmap, which not many software companies do. It is helpful for customers to understand how the products might evolve over the next 18 to 24 months.

In terms of future developments, users could benefit from more investment by Qlik and its new owners in collaboration and mobile capabilities. A few years ago I noted that Qlik experimented with supporting collaboration capabilities like chat streams and sharing analytic displays, but these features have fallen by the wayside. On the mobile front, Qlik is in the middle of transitioning from QlikView Mobile delivered as a native app on mobile devices to Qlik Sense mobile capabilities delivered via HTML5. As a result, there are some gaps, at least temporarily, between the two sets of products.

Overall, Qlik has continued to demonstrate an ability to design and deliver products that are visually appealing and excel in ease of use. Qlik Sense 3.0 includes additional capabilities that will help users understand and analyze their data in a pure browser-based product accessible from the cloud and mobile devices. If you haven’t considered Qlik in the past, perhaps the new release is a good reason to consider it now.


David Menninger

SVP & Research Director

Follow Me on Twitter @dmenningerVR and Connect with me on LinkedIn.

Follow on

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,527 hits
%d bloggers like this: