Data Engineering with Avishkar: In-memory BI

August 08, 2014

In-memory BI

In-memory BI gained popularity largely due to the success of QlikTech, provider of the in-memory-based QlikView BI product. Following QlikTech’s lead, many other BI vendors have jumped on the in-memory “hype wagon,” including the software giant, Microsoft, which has been aggressively marketing PowerPivot, their own in-memory database engine.

The concept of in-memory business intelligence is not new. It has been around for many years. The only reason it became widely known recently is because it wasn’t feasible before 64-bit computing became commonly available. Before 64-bit processors, the maximum amount of RAM a computer could utilize was barely 4GB, which is hardly enough to accommodate even the simplest of multi-user BI solutions. Only when 64-bit systems became cheap enough did it became possible to consider in-memory technology as a practical option for BI.

When we talk about whether a database is disk-based or in-memory, we are talking about where the data resides while it is actively being queried by an application: with disk-based databases, the data is queried while stored on disk and with in-memory databases, the data being queried is first loaded into RAM.

Relational databases are great as the backbone of operational applications such as CRM, ERP or Web sites, where transactions are frequently and simultaneously inserted, they are a poor choice for supporting analytic applications which usually involve simultaneous retrieval of partial rows along with heavy calculations.

The “traditional best-practice” BI has been based on the following architecture:

We start with business applications that gather the data we would like to analyze.
We create a copy that’s typically called an “operational data store” or ODS, then we use ETL (extraction, transformation, and loading) technology to load data into database structures optimized for business intelligence – a data mart or data warehouse.
To provide better interactivity, an additional data cache is often created for a particular report or cube.
Because this architecture is slow and unwieldy, organizations often create extra data marts for a particular business need.
The result is a vacuum tube: it works, and it’s the best alternative we have right now, but it’s slow, complex, and expensive.

The “In Memory” BI has been based on the following architecture:

“In Memory” BI store all the required data in-memory.
Operations run with the support of massively parallel processing.
Column databases are very well adapted to parallelization: because each column is stored in a separate area of memory, aggregates can be efficiently handed off to a separate processor, or even partitioned across several processorsWith column database, data loading times are no longer a problem.
64 bit addressing has radically increased how easy data to be accessed.

QlikView is a relatively non-invasive, business-focused product that uses in-memory technology to present data in an associative architecture. QlikView is a technology aimed at helping analysts and business people get insights from information as part of a highly visual process. When business users explore data in QlikView, they use associations and patterns that are more like those in the human mind than in a traditional database, releasing users from the limitations of those databases. QlikView’s claim to value is that it gets business users working quickly, collaboratively and flexibly, without making business intelligence an IT project in and of itself. The idea is that, through associative search, and displays that gray out (but still show) data that doesn’t fit in the framework of the query, users can not only answer questions quickly, but can even find the answers to the questions they didn’t ask — but should have. Its accelerated and accessible brand of business intelligence is called “business discovery.”

The beauty of it is that with QlikView:

Metadata management is optional and pervasive. QlikView customers use metadata only when and where it adds value. QlikView creates metadata automatically. Whether or not it is used is up to the designers and developers.
Our focus is on QlikView itself. QlikView’s metadata focus is on helping stakeholders understand and manage the QlikView environment. Developers and designers get a clear picture of how well their QlikView applications were built and gain insights that help them maintain their applications.
Developers can introduce metadata usage over time. With QlikView, developers do not have to create a metadata layer ahead of time. They can define and collect metadata after they have created, tested, and even deployed applications. Project teams can instead focus on getting the right business information and analytical tools in the hands of the right people at the right time.

Data Engineering with Avishkar

August 08, 2014

In-memory BI

No comments:

Creating DataFrames from CSV in Apache Spark

Search This Blog