You are currently browsing the tag archive for the ‘Data Warehouse’ tag.

Kalido recently introduced version 9 of its Information Engine product. The company has been around for 10 years but has had difficulty establishing its identity in the information management market. Kalido was perhaps ahead of its time, partly a vendor of data integration, partly master data management and partly data governance. As an example of the positioning challenge, its core product, Information Engine, while not a data integration tool, could in some cases provide sufficient capabilities to meet an organization’s data integration needs. Its real value, however, comes from authoring and management of information about the user’s data warehouse.

Information Engine introduces an abstraction layer that separates the physical design of a warehouse from its logical design. Its repository holds information about the data model used in data warehouses and data marts, as well as the associated processes of managing the warehouse life cycle. This includes information about measures, hierarchies, aggregates, change management routines, security and auditing. By looking at the data warehouse as a process rather than a physical implementation of a data model, Kalido can help organizations manage processes that enhance data governance. For example, workflows with approvals and audit trails are a natural by-product of this process-based approach.

With version 9, Kalido continues to speed up data warehouse implementations. It pushes more of the processing down into the underlying database, which supports extract, load and transform (ELT) processes rather than the more conventional extract, transform and load (ETL). Doing more processing in the database using ELT eliminates the need to move the data twice: once to a transformation engine and then again to the data warehouse. The key change to support ELT in the new release is the introduction of staging tables, where data can land and be transformed before being loaded into the appropriate data warehouse tables. Version 9 also has more data integration features and additional testing capabilities.

Kalido also offers master data management (MDM) capabilities across multiple domains, derived from the process-driven approach of Information Engine. Kalido MDM provides separate interfaces for data stewards as well as users of the master data. Data stewards, who oversee the master data processes, can define and perform data matching, identity resolution, validation and publication. Kalido provides connectors to Trillium and DataFlux software for external data validation and claims to be building them for products from others. Users of the master data can search through it, browse the data model and issue change requests.

The most interesting aspect of the company’s process-driven approach is the ability to capture and apply data governance policies. As information management capabilities mature, organizations can focus more attention on data governance. Kalido has recognized this opportunity, and while for years its messaging has included data governance, only last year did it introduce Kalido Data Governance Director as a separate product. Data Governance Director uses a policy management metaphor in which organizations define their data governance policies as well as metrics to measure whether the policies are being enforced. Our benchmark research into data governance found that designing and maintaining policies and rules was the top objective for data governance in 75 percent of organizations and a current lack of satisfaction with current approaches. The research also found that a lack of sufficient policies was one of the top barriers towards a single version of data to leverage across the enterprise.

We are currently conducting benchmark research on trends in information management to help us understand whether interest in data governance has risen, and to determine the relative priorities of other information management processes, including master data management, data integration and data quality. I expect we’ll see rising interest in data governance, which could bode well for products such as Kalido’s Data Governance Director.

One of the challenges Kalido still faces is communicating its positioning clearly to the market. Information Engine 9 includes data integration features that make Kalido more competitive, yet the company does not attempt to compete directly in the data integration market – nor do I think it should. I would prefer to see more partnerships with those vendors, which would allow Kalido to focus where it can add the most value: managing the processes associated with data warehousing. In particular, Data Governance Director represents a unique approach that’s worth exploring. Even if your organization isn’t ready to purchase the product, you can probably learn something useful about data governance that you can apply to your own processes.

Regards,

David Menninger – VP & Research Director

Recently Karmasphere introduced version 1.5 of its Analyst product which helps organizations analyze “big data” stored in Hadoop, the open source large-scale data processing technology. An independent software vendor focused exclusively on the Hadoop market, Karmasphere made available a community edition of its developer product in September 2009 and launched the company in March 2010. Since then it has been active and visible in Hadoop-related events including Hadoop World, the IBM Big Data Symposium and others.

Fundamentally, Karmasphere focuses on making Hadoop easier to use and more accessible for both developers and analysts, who need help in this area. Our recent benchmark research on Hadoop and Information Management shows a significant shortage of skills: Hadoop users cited staffing and training as the two most significant obstacles in analyzing large scale data sets, impacting 80% and 74% of organizations, respectively.

Karmasphere Analyst 1.5 provides an interactive, graphical environment for analyzing data in Hadoop. To begin the process, it helps users understand the data structures available in Hadoop by presenting a table-based view of existing data and the ability to create new tables. In addition, Karmasphere Analyst combines information from multiple Hadoop data stores to present a unified view. Users assemble queries with a SQL-based development environment that includes syntax checking and prompts to help in the process. More than 100 user-defined functions (UDFs) are included for many common tasks and analyses. Once assembled, these queries can be stored, reused and combined together into a “query chain” or workflow involving multiple steps that are often necessary in the data preparation and analysis process. Karmasphere Analyst provides visual query plans and explanations that make it easier to understand and modify the queries. Users also can visualize the results of queries in graphical or tabular displays.

Later on in the process Karmasphere helps users prepare and move jobs into production. It includes embedded Hive and Hadoop capabilities for desktop prototyping so users can test and debug on their desktops. Then they can package and export the jobs for deployment to a cluster. Karmasphere also provides capabilities for monitoring jobs and optimizing job performance. It works with a variety of Hadoop sources including Amazon Elastic MapReduce, Apache, Cloudera, EMC Greenplum, IBM and MapR <http://www.mapr.com&gt;. Given the proliferation of sources for Hadoop, including the recently formed Hortonworks with its focus on Apache Hadoop, the ability to work with multiple version could be valuable to organizations in the evaluation process and to those who have chosen to work with multiple versions, which is the case with nearly half the participants in our benchmark research cited above.

Karmasphere has carved out a niche in the big-data market where there are unmet needs. However, it will face competition from bigger vendors as they incorporate features into their business intelligence and information management platforms that make it easier to work with Hadoop. One way Karmasphere could maintain a unique position would be to broaden its capabilities for advanced analytics. Our research shows that 69% of organizations working with Hadoop use it for advanced analytics including data mining and predictive analytics. Another way Karmasphere could improve its position with respect to larger vendors would be to provide better integration with tools beyond Excel and Tableau, which it offers today.

In the meantime, if you work with Hadoop and are looking for ways to be more productive or empower a broader range of analysts, you can try some of Karmasphere’s features for yourself here.

Regards,

David Menninger – VP & Research Director

Follow on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22 other followers

RSS David Menninger’s Analyst Perspective’s at Ventana Research

  • An error has occurred; the feed is probably down. Try again later.

David Menninger – Twitter

Top Rated

Blog Stats

  • 46,033 hits
%d bloggers like this: