Data warehouse is MUST
What is data warehouse? Everybody will face the same question before what is exactly data warehouse. Data is collected and integrated from heterogeneous sources. which can be used for decision making, taking out a certain pattern, making reports etc.
A data warehouse is separate from the operational database. It cannot be frequently updated.
Operational databases are something where day to day transaction takes place, they are completely separate from Data warehouse. Operational databases they contain current data of the organization. Where data warehouse contains historical data.
It’s very risky to do changes on operational data you need to be very careful, one wrong change in operational data can cost a company in billions.
An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing, etc. In contract, data warehouse queries are often complex and they present a general form of data.
Operational databases support concurrent processing of multiple transactions. Concurrency control and recovery mechanisms are required for operational databases to ensure robustness and consistency of the database.
An operational database query allows to read and modify operations, while an OLAP query needs only read-only access of stored data.
Data warehouse Architecture
Data mart is data warehouse specific to the module. Suppose sales data, we can have a data warehouse which only specific to sales data. Data marts are only related to the specific module in the organization. Their life span of implementation is small. They are small in size. They related to specific groups.
All the data which is collected and integrated is loaded into this tier. Now, what is extract, transform and load do?
The data which is collected from multiple sources is checked is it a redundant data? Is it full proof? Only required data is extracted from these various sources.
Transformation is nothing but cleaning the data, checking accurateness of data, reliable data is gathered some operations are performed like if I want a data from past two years of highest paid employees. Then from the extracted data, you perform SQL operations on it and only required data is transformed in well manner form.
The load is nothing but all the data which is transformed and required is loaded into the data warehouse.
In the middle tier, we have the OLAP Server that can be implemented in either of the following ways.
By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations.
By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations.
In top-tier you can make reports to the higher authorities for making reports, you can detect certain analysis pattern, observing trends and make potential reports for effective decisions.
Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information. This chapter covers the types of OLAP, operations on OLAP, a difference between OLAP, and statistical databases and OLTP.
There are 4 types OLAP Operations
- Slice/ Dice
Roll up operation is going up in the hierarchy. It aggregates the data. Reduces dimensions.
As we can see in the above cube, form different cities of Canada and USA, it gives us only two countries, more aggregate data on mobile modem phone security items.
Drill down is an exact reverse process of roll-up, it is the concept of going up in the hierarchy of dimension, It’ s a concept of generalization.
It gives us a more granular view of data.
The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
If we consider first quarter q1, we can see the sales of mobile items in the first quarter for Chicago, New York, Toronto, and Vancouver.
This is how it gives more detail view on one dimension.
It selects more than one dimensions from a given cube and makes new sub-cube out of it.
As we can see in the above diagram, we have selected Toronto and Vancouver for 1st and 2nd quarter, it makes us completely focus on these two cities sales of mobile modem phone security.
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an alternative presentation of data. Consider the following diagram that shows the pivot operation.