Thursday, 13 October 2011

Data Warehousing Introduction

A data warehouse is a type of computer database that is responsible for collecting and storing the information of a particular organization. The goal of using a data warehouse is to have an efficient way of managing information and analyzing data.

Despite the fact that data warehouses can be designed in a number of different ways, they all share a number of important characteristics. Most data warehouses are subject oriented. This means that the information that is in the data warehouse is stored in a way that allows it to be connected to objects or events which occur in reality. 

Another characteristic that is frequently seen in data warehouses is called a time variant. A time variant will allow changes in the information to be monitored and recorded over time. The information that exists in data warehouses is non-volatile. This means that it cannot be deleted, and must be held to be analyzed in the future. All of the programs that are used by a particular institution will be stored in the data warehouse, and it will be integrated together. The first data warehouses were developed in the 1980s. As societies entered the information age, there was a large demand for efficient methods of storing information. 

Many of the systems that existed in the 1980s were not powerful enough to store and manage large amounts of data. There were a number of reasons for this. The systems that existed at the time took too long to report and process information. Many of these systems were not designed to analyze or report information. In addition to this, the computer programs that were necessary for reporting information were both costly and slow. To solve these problems, companies begin designing computer databases that placed an emphasis on managing and analyzing information. These were the first data warehouses, and they could obtain data from a variety of different sources, and some of these include personal computers and mainframes. 

Spreadsheet programs have also played an important role in the development of data warehouses. By the end of the 1990s, the technology had greatly advanced, and was much lower in cost. The technology has continued to evolve to meet the demands of those who are looking for more functions and speed. There are four advances in data warehouse technology that has allowed it to evolve. These advances are offline operational databases, real time data warehouses, offline data warehouses, and the integrated data warehouses. 

The offline operational database is a system in which the information within the database of an operational system is copied to a server that is offline. When this is done, the operational system will perform at a much higher level. As the name implies, a real time data warehouse system will be updated every time an event occurs. For example, if a customer orders a product, a real time data warehouse will automatically update the information in real time. The offline data warehouse is a database that is updated on a regular from an operational system. 

With the integrated data warehouse, transactions will be transferred back to the operational systems each day, and this will allow the data to easily be analyzed by companies and organizations. There are a number of devices that will be present in the typical data warehouse. Some of these devices are the source data layer, reporting layer, data warehouse layer, and transformation layer. There are a number different data sources for data warehouses. Some popular forms of data sources are Teradata, Oracle database, or Microsoft SQL Server. 

Another important concept that is related to data warehouses is called data transformation. As the name suggests, data transformation is a process in which information transferred from specific sources is cleaned and loaded into a repository.

Data transformation can either be a manual or automated process. Code can be manually generated, or an ETL tool can be utilized. The device that is responsible for transforming the data will compare it to other systems. It will also placed the data in a specific standard. In addition to this, it will often be linked to other systems which can assist it. The goal of using a data warehouse is to store and monitor information in a way that allows it to easily tbe analyzed. The data held in the warehouse will typically remain on file for a year.

Key Data Warehouse systems and the most widely used database engines for storing and serving data for the enterprise business intelligence and performance management.

  • SAP BW - Business Information Warehouse (SAP Netweaver BI)
  • Microsoft SQL Server
  • Teradata
  • Oracle
  • IBM DB2 (Infosphere Warehouse)
  • SAS



No comments:

Post a Comment