Está en la página 1de 6

Data Mining: What is Data Mining?

Overview
Data mining (sometimes called data or knowledge discovery) is the
process of analysing data from different perspectives and
summarizing it into useful information that can be used to increase
revenue, cut costs, or both. Data mining is an analytical tools for
analysing data. It allows users to analyse data from many different
dimensions or angles, categorize it, and summarize the relationships
identified. Technically, data mining is the process of finding
correlations or patterns among fields in large relational
databases.

Data, Information, Knowledge and Data Warehouses:


Data
Data are any facts, numbers, or text that can be processed by a
computer. These are of three types:
operational or transactional data
inventory, payroll, and accounting

such

as:

sales,

cost,

non-operational data, such as industry sales, forecast data,


and macro-economic data
meta data - data about the data itself, such as logical database
design or data dictionary definitions
Information
The patterns, associations, or relationships among all this data can
provide information. For example, analysis of retail point of sale
transaction data can yield information on which products are selling
and when.
Knowledge
Information can be converted into knowledge about historical
patterns and future trends. For example, summary information on
retail supermarket sales can be analysed in light of promotional
efforts to provide knowledge of consumer buying behaviour. Thus, a
manufacturer or retailer could determine which items are most
susceptible to promotional efforts.
Data Warehouses
Data warehousing is defined as a process of centralized data
management and retrieval. Data warehousing represents an ideal
vision of maintaining a central repository of all organizational data.

Centralization of data is needed to maximize user access and


analysis. The data analysis software is what supports data mining.

What can data mining do?


Data mining is primarily used today by companies with a strong
consumer focus - retail, financial, communication, and marketing
organizations. It enables these companies to determine relationships
among "internal" factors such as price, product positioning, or staff
skills, and "external" factors such as economic indicators,
competition, and customer demographics. And, it enables them to
determine the impact on sales, customer satisfaction, and corporate
profits. Finally, it enables them to "drill down" into summary
information to view detail transactional data.
Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by providing
these capabilities:
Automated prediction of trends and behaviors. Data
mining automates the process of finding predictive information
in large databases. Questions that traditionally required
extensive hands-on analysis can now be answered directly
from the data quickly. Other predictive problems include
forecasting bankruptcy and other forms of default, and
identifying segments of a population likely to respond similarly
to given events.
Automated discovery of previously unknown patterns.
Data mining tools sweep through databases and identify
previously hidden patterns in one step.

How does data mining work?


Data mining consists of five major elements:
Extract, transform, and load transaction data onto the data
warehouse system.
Store and manage the data in a multidimensional database
system.
Provide data access to business analysts and information
technology professionals.
Analyze the data by application software.

Present the data in a useful format, such as a graph or table.


The technique that is used to perform these feats in data mining is
called modeling. Modeling is simply the act of building a model in
one situation where you know the answer and then applying it to
another situation that you don't. Computers are loaded up with lots
of information about a variety of situations where an answer is
known and then the data mining software on the computer must run
through that data and distill the characteristics of the data that
should go into the model. Once the model is built it can then be
used in similar situations where you don't know the answer.
Data mining software analyzes relationships and patterns in stored
transaction data based on open-ended user queries. Several types of
analytical software are available: statistical, machine learning, and
neural networks. Generally, any of four types of relationships are
sought:
Classes: Stored data is used to locate data in predetermined
groups.
Clusters: Data items are grouped according to logical
relationships or consumer preferences.
Associations: Data can be mined to identify associations.
Sequential patterns: Data is mined to anticipate behavior
patterns and trends.
Different levels of analysis are available:
Artificial neural networks: Non-linear predictive models that
learn through training and resemble biological neural networks
in structure.
Genetic algorithms: Optimization techniques that use
processes such as genetic combination, mutation, and natural
selection in a design based on the concepts of natural
evolution.
Decision trees: Tree-shaped structures that represent sets of
decisions. These decisions generate rules for the classification
of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square

Automatic

Interaction

Detection

(CHAID)

Nearest neighbor method: A technique that classifies each


record in a dataset based on a combination of the classes of
the k record(s) most similar to it in a historical dataset
(where k 1).
Sometimes
called
the k-nearest
neighbor
technique.
Rule induction: The extraction of useful if-then rules from
data based on statistical significance.
Data visualization: The visual interpretation of complex
relationships in multidimensional data. Graphics tools are used
to illustrate data relationships.

What technological infrastructure is required?


There are two critical technological drivers:
Size of the database: the more data being processed and
maintained, the more powerful the system required.
Query complexity: the more complex the queries and the
greater the number of queries being processed, the more
powerful the system required.

An Architecture for Data Mining


To best apply these advanced techniques, they must be fully
integrated with a data warehouse as well as flexible interactive
business analysis tools. Many data mining tools currently operate
outside of the warehouse, requiring extra steps for extracting,
importing, and analysing the data. The resulting analytic data
warehouse can be applied to improve business processes
throughout
the
organization.

Integrated Data Mining Architecture

The ideal starting point is a data warehouse containing a


combination of internal data tracking all customer contact coupled
with external market data about competitor activity. This warehouse
can be implemented in a variety of relational database systems:
Sybase, Oracle, Redbrick, and so on, and should be optimized for
flexible and fast data access.
An OLAP (On-Line Analytical Processing) server enables a more
sophisticated end-user business model to be applied when
navigating the data warehouse. The multidimensional structures
allow the user to analyze the data as they want to view their
business. The Data Mining Server must be integrated with the data
warehouse and the OLAP server. Integration with the data
warehouse enables operational decisions to be directly implemented
and tracked. As the warehouse grows with new decisions and
results, the organization can continually mine the best practices and
apply them to future decisions.

Conclusion
Comprehensive data warehouses that integrate operational data
with customer, supplier, and market information have resulted in an
explosion of information. Competition requires timely and
sophisticated analysis on an integrated view of the data. However,
there is a growing gap between more powerful storage and retrieval
systems and the users ability to effectively analyze and act on the
information they contain. Both relational and OLAP technologies
have tremendous capabilities for navigating massive data
warehouses, but brute force navigation of data is not enough. A new
technological leap is needed to structure and prioritize information
for specific end-user problems. The data mining tools can make this
leap. Quantifiable business benefits have been proven through the
integration of data mining with current information systems, and
new products are on the horizon that will bring this integration to an
even wider audience of users.

También podría gustarte