Data Warehousing

Data Warehousing

Table of Contents Expand The data mining process breaks down into five steps: 1. An organization collects data and loads it into a data warehouse. 2. The data are then stored and managed, either on in-house servers or in a cloud service. 3. Business analysts, management teams, and information technology professionals access and organize the data. 4. Application software sorts the data. 5. The end-user presents the data in an easy-to-share format, such as a graph or table. They include: Determining the business objectives and its key performance indicators. Collecting and analyzing the appropriate information. Identifying the core business processes that contribute the key data. Constructing a conceptual data model that shows how the data are displayed to the end-user. A data warehouse is not the same as a database: A database is a transactional system that monitors and updates real-time data in order to have only the most recent data available. The Corporate Finance Institute identifies these potential disadvantages of maintaining a data warehouse: It takes considerable time and effort to create and maintain the warehouse. Gaps in information, caused by human error, can take years to surface, damaging the integrity and usefulness of the information.

What Is Data Warehousing?

Data warehousing is the secure electronic storage of information by a business or other organization. The goal of data warehousing is to create a trove of historical data that can be retrieved and analyzed to provide useful insight into the organization's operations.

Data warehousing is a vital component of business intelligence. That wider term encompasses the information infrastructure that modern businesses use to track their past successes and failures and inform their decisions for the future.

How Data Warehousing Works

The need to warehouse data evolved as businesses began relying on computer systems to create, file, and retrieve important business documents. The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy.

Data warehousing is designed to enable the analysis of historical data. Comparing data consolidated from multiple heterogeneous sources can provide insight into the performance of a company. A data warehouse is designed to allow its users to run queries and analyses on historical data derived from transactional sources.

Data added to the warehouse do not change and cannot be altered. The warehouse is the source that is used to run analytics on past events, with a focus on changes over time. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve, and easy to manage.

Maintaining the Data Warehouse

There are certain steps that are taken to maintain a data warehouse. One step is data extraction, which involves gathering large amounts of data from multiple source points. After a set of data has been compiled, it goes through data cleaning, the process of combing through it for errors and correcting or excluding any that are found.

The cleaned-up data are then converted from a database format to a warehouse format. Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use. Over time, more data are added to the warehouse as the various data sources are updated.

A key book on data warehousing is W. H. Inmon's "Building the Data Warehouse," a practical guide that was first published in 1990 and has been reprinted several times.

Today, businesses can invest in cloud-based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.

Data Mining

Businesses warehouse data primarily for data mining. That involves looking for patterns of information that will help them improve their business processes.

A good data warehousing system makes it easier for different departments within a company to access each other's data. For example, a marketing team can assess the sales team's data in order to make decisions about how to adjust their sales campaigns.

The 5 Steps of Data Mining

The data mining process breaks down into five steps:

  1. An organization collects data and loads it into a data warehouse.
  2. The data are then stored and managed, either on in-house servers or in a cloud service.
  3. Business analysts, management teams, and information technology professionals access and organize the data.
  4. Application software sorts the data.
  5. The end-user presents the data in an easy-to-share format, such as a graph or table.

The concept of the data warehouse was introduced by two IBM researchers in 1988.

Data Warehousing vs. Databases

A data warehouse is not the same as a database:

For example, a database might only have the most recent address of a customer, while a data warehouse might have all the addresses for the customer for the past 10 years.

Data mining relies on the data warehouse. The data in the warehouse are sifted for insights into the business over time.

Advantages and Disadvantages of Data Warehouses

Data warehousing is intended to give a company a competitive advantage. It creates a resource of pertinent information that can be tracked over time and analyzed in order to help a business make more informed decisions.

It also can drain company resources and burden its current staff with routine tasks intended to feed the warehouse machine.

The Corporate Finance Institute identifies these potential disadvantages of maintaining a data warehouse:

Disadvantages

Data Warehouse FAQs

Here are the answers to some commonly-asked questions about data warehousing.

What Is a Data Warehouse and What Is It Used For?

A data warehouse is an information storage system for historical data that can be analyzed in numerous ways. Companies and other organizations draw on the data warehouse to gain insight into past performance and plan improvements to their operations.

What Is a Data Warehouse Example?

It goes to its data warehouse to understand its current customer better. It can find out whether its customers are predominantly women over 50 or men under 35. It can learn more about the retailers that have been most successful in selling their bikes, and where they're located. It might be able to access in-house survey results and find out what their past customers have liked and disliked about their products.

All of this information helps the company to decide what kind of new model bicycles they want to build and how they will market and advertise them. It's hard information rather than seat-of-the-pants decision-making.

What Are the Stages of Data Warehousing?

There are at least seven stages to the creation of a data warehouse, according to ITPro Today, an industry publication. They include:

Is SQL a Data Warehouse?

SQL, or Structured Query Language, is a computer language that is used to interact with a database in terms that it can understand and respond to. It contains a number of commands such as "select," "insert," and "update." It is the standard language for relational database management systems.

A database is not the same as a data warehouse, although both are stores of information. A database is an organized collection of information. A data warehouse is an information archive that is continuously built from multiple sources.

The Bottom Line

The data warehouse is a company's repository of information about its business and how it has performed over time. Created with input from employees in each of its key departments, it is the source for analysis that reveals the company's past successes and failures and informs its decision-making.

Related terms:

Blockchain : What You Need to Know

A guide to help you understand what blockchain is and how it can be used by industries. You've probably encountered a definition like this: “blockchain is a distributed, decentralized, public ledger." But blockchain is easier to understand than it sounds. read more

Business Intelligence – BI

Business intelligence (BI) refers to the procedural and technical infrastructure that collects, stores, and analyzes data produced by a company. read more

Data Analytics

Data analytics is the science of analyzing raw data in order to make conclusions about that information.  read more

Data Mining

Data mining is a process used by companies to turn raw data into useful information by using software to look for patterns in large batches of data. read more

Gross Domestic Product (GDP)

Gross domestic product (GDP) is the monetary value of all finished goods and services made within a country during a specific period. read more

Managerial Accounting

Managerial accounting is the practice of analyzing and communicating financial data to managers, who use the information to make business decisions. read more