BSc: In-Memory Datawarehousing (MS2012-03)

posted Mar 29, 2012, 5:51 AM by Marco Spruit   [ updated Apr 8, 2013, 8:43 AM ]
This is a paid internship at a BI implementation vendor.

Considering the trend towards more and more data where one wants to analyze and report on, performance cannot stay behind. Users still expect response times comparable to waiting times when they analyzed much less data. As a first step made by the software and hardware vendors, analysis of in-memory databases was enabled. This type of solution is close to the end user and often contain the data marts with their relevant information.

The trend of in-memory database and appliances is likely to continue and thus also become an option for the ' traditional ' datawarehouses. The purpose of the study will therefore be to provide insight in the challenges that need to be overcome before data warehouses can successfully integrate in-memory technology. This assignment mostly focuses on the Transform and Load aspects of the ETL phase in the typical BI architecture:

Some important questions:
  • Do the current Loading strategies still work (in the context of ETL)? 
  • Are there other modeling choices to be made? 
  • Is column-based data storage necessary? 
  • How to deal with fail-over and crashworthiness? 
  • Which situational factors are relevant (e.g. the size of today's conventional DWH would be in-memory technology to run and then at what price?) 

Approach

An important part of the research will consist of literature (including web) study into most questions. Depending on the affinity with ETL tooling belongs also to the possibility of an ETL tool for loading and different modeling strategies with traditional methods to compare.

Deliverable

A framework/method/model/matrix based which integrates literature findings and detailed case studies with experts. T.B.D.

Comments