Está en la página 1de 6

La minera de datos es un miembro clave en la familia de productos de Business Intelligence (BI),

junto con el procesamiento analtico en lnea (OLAP), los informes empresariales y ETL (Cargas,
transformacin y extraccin de datos). La minera de datos trata de analizar los datos y la
bsqueda de patrones ocultos utilizando mtodos automticos o semiautomticos. Durante la
ltima dcada, grandes volmenes de datos se han almacenado en las bases de datos. El resultado
de este proceso ha convertido a las organizaciones ricas en datos e informacin pero pobres en
conocimientos, llegando a alcanzar colecciones de datos tan grandes que el uso prctico de estos
almacenes se ha convertido en limitada. El objetivo principal de la Minera de Datos es extraer
patrones ocultos a partir de estos datos, aumentando su valor intrnseco y la transferencia de los
datos al conocimiento.

La minera de datos proporciona a las empresas una gran valorizacin del negocio. Por ejemplo, el
aumento de la competencia como resultado del marketing moderno y canales de distribucin
como Internet y las telecomunicaciones, han llevado a que las empresas se enfrenten a
competencias en todo el mundo, siendo la clave para el xito del negocio la capacidad de retener
a los clientes existentes y adquirir nuevos. La minera de datos contiene las tecnologas que
permiten a las empresas analizar los factores que afectan a estas materias.

Los algoritmos son ms precisos, ms eficientes y pueden manejar datos cada vez ms
complicados. Adems, las interfaces de Minera de Datos para la programacin de aplicaciones
(API) se han estandarizado, permitiendo a los desarrolladores construir mejores aplicaciones.

Patrones de fuga: que tienen ms probabilidad de ..

2.3 Tareas de Minera de Datos La minera de datos se puede utilizar para resolver cientos de
problemas de negocios. Sobre la base de la naturaleza de estos problemas, se pueden agrupar en
las siguientes tareas. 2.3.1 Clasificacin La clasificacin es una de las tareas de minera de datos
ms populares. Aplicaciones de negocios como el anlisis de la rotacin de clientes, gestin de
riesgos y orientacin de anuncios por lo general involucran clasificacin. La clasificacin se refiere
a la asignacin de casos en categoras basadas en un atributo predecible. Cada caso contiene un
conjunto de atributos, uno de los cuales es el atributo de clase (atributo de prediccin). La tarea
requiere encontrar un modelo que describe el atributo de prediccin en funcin de los atributos
de entrada. Los tpicos algoritmos de clasificacin incluyen Arboles de Decisin, Redes Neuronales,
y Redes Bayesianas [Tan05]. 2.3.2 Agrupamiento El agrupamiento o tambin llamado
segmentacin. Se utiliza para identificar agrupaciones naturales de los casos, sobre la base de un
conjunto de atributos. Estos casos dentro del mismo grupo tienen valores similares de los
atributos. La segmentacin es una tarea de minera de datos sin supervisin, es decir sin la gua de
ninguna variable en particular. Todos los atributos de entrada son tratados por igual y el xito
consiste en agrupar a los individuos en segmentos que resulten significativos para el objetivo del
proyecto. La mayora de los algoritmos de agrupacin construyen 10 el modelo a travs de un
nmero de iteraciones y se detienen cuando el modelo converge, es decir cuando los lmites de
estos segmentos son estabilizados [Tan05]. 2.3.3 Asociacin Asociacin es otra de las tareas de
minera de datos ms populares. Tambin se suele llamar anlisis del carrito de compras. Un
problema tpico de la asociacin en aplicaciones empresariales es analizar una tabla de transaccin
de ventas e identificar los productos que a menudo se venden en la misma canasta de la tienda. El
uso comn de Asociacin es identificar los grupos comunes de elementos y las reglas con el fin de
conocer la venta cruzada. En trminos de asociacin, cada producto, o ms generalmente, cada
atributo / par de valor se considera como un elemento. La tarea de asociacin tiene dos objetivos,
encontrar conjuntos frecuentes de elementos y encontrar reglas de asociacin [Tan05]. 2.3.4
Regresin La tarea de regresin es similar a la clasificacin. La diferencia principal es que el
atributo de prediccin es una serie continua. Las tcnicas de regresin se han estudiado
ampliamente desde hace siglos en el campo de las estadsticas. Regresin lineal y regresin
logstica son los mtodos de regresin ms populares. Otras tcnicas de regresin incluyen rboles
de regresin y redes neuronales [Tan05]. Las tareas de regresin pueden resolver muchos
problemas de negocios. Por ejemplo, pueden utilizarse para predecir las tasas de descuento sobre
la base de la redencin del valor nominal, mtodo de distribucin y volumen de distribucin o para
predecir la velocidad del viento sobre la base de temperatura, presin atmosfrica y humedad
[Tan05]. 2.3.5 Pronostico El pronstico o forecasting es otra tarea de minera de datos importante.
Es la proyeccin de una tendencia con respecto al tiempo u otra variable, consiste en la estimacin
y el anlisis de la demanda futura para un producto en particular, componente o servicio,
utilizando como entrada datos histricos de venta, informaciones de marketing o informacin
promocional. 11 2.3.6 Anlisis de la Desviacin Esta tarea sirve para encontrar los casos que se
comportan muy diferentes de los dems. Tambin se le llama deteccin de valores, que se refiere
a la deteccin de cambios significativos del comportamiento observado previamente. El anlisis de
desviaciones puede ser utilizado en muchas aplicaciones. El ms comn es la deteccin de fraudes
de tarjetas de crdito. Para identificar los casos anormales de millones de transacciones es una
muy difcil. Otras aplicaciones incluyen la deteccin de intrusiones en la red, anlisis de generacin
de errores, y as sucesivamente [Tan05].

you aim to collect reading from a sensor network distributed across the whole system, and use
that to build a probabilistic representation of consumption, demand, quality and status. With this
model you can predict all those quantities and then use that to control the system.

It all depends on how strong is your data mining. If it is just based on AI expert rules, than the
applications are as in the books. I'm not a Civil Engineer, Tariq on the other hand mentions a
number of advanced and exciting applications...
If your data mining is very good, you have then the freedom to use any data collection that deals
with construction, planning, maintenance, accidents, whatever that is in reach. The process of
working with such a mix of data would be simply asking, what phenomena exist there that can be
formulated and used to better the engineering? the use, quality, cost, business or other objectives
that one may have.
If you have a good data mining tool, data tells always new unexpected things some of which were
not asked about but rather popped out as a discovery.
Anyway, as an engineer it would be better to do what Tariq suggests, start with existing
knowledge and see if it falls in line or not.
Regards,
Edith
Home of GT data mining
You could study transportation aspects. Patterns in the flow of people when they move frome
home to work. You can get data from public transportation companies! Or maybe google. Trying
to discover emerging patterns or problems in the transportation system.

Es una ciencia que estudia una etapa de descubrimiento de datos que identifica y encuentra
patrones ocultos en los datos que no se ven a simple vista. de hecho de todo lo que llevo visto son
metodologias para descubrir patrones en la salud, comunicaiones, informatica, captacion de
clientes, etc etc etc.

Vahid, the aim is to predict two real valued responses, Y1:Heating Load, Y2:Cooling Load. That is,
you have a regression problem. The authors and maintainers of the data set also suggest a multi-
class classification problem if the response (Y1 or Y2) is rounded to the nearest integer (the
classes). The aim is to use the eight features (X1 to X8) to predict each of the two responses, that
is you use the X-features to predict Y1, as well, you can use the X-features to predict Y2.

Features:
X1 Relative Compactness
X2 Surface Area
X3 Wall Area
X4 Roof Area
X5 Overall Height
X6 Orientation
X7 Glazing Area
X8 Glazing Area Distribution
y1 Heating Load
y2 Cooling Load

We have recently used the concept to develop decision trees for damage prediction in RC
structures.

Data mining and machine learning have endless applications in Civil Engineering and any other
area really. The problem is how good is the data you are mining and how good are the mining
tools you are using.

Several data mining research works are focused on diagnosis by using classification methods to
determine the existence of patterns for structure failures, deviation from accepted boundaries of
quality or critical points.

But in the same way that within any other area, data mining could be more useful for civil
engineering to determine unknown hidden data. I mean there are several facts that won't be
obvious until some data enlightens its path.
Think about answers for questions like: What's the weakest link of an structure? What's the least
useful piece of a building? What are the most critical combination for an electrical installation?

Los trabajos sobre estimaciones en reas pequeas, surgen a partir de la continua y creciente necesidad de
estimar indicadores ms precisos en distintas reas geogrficas. Las tcnicas que se discutirn facilitarn a
los productores de informacin estadstica diseminar, con mayor frecuencia, estimaciones precisas cuya
desagregacin geogrfica se consigue hoy en da en la mayora de los pases slo mediante el levantamiento
de censos, sin incurrir en los elevados costos de estos proyectos. Para conseguir este fin, los resultados
muestrales se combinan con los de censos y registros administrativos, disponibles.

As mentioned before, data mining ans machine learning can be applied to a wide range of topics.
We at Microsoft apply it to software development and code bases. But the most important part is
to understand the data in the first place. It sounds trivial but it is not always that easy, although it
is key. Why is the data looking the way it does and how to interpret the results of
recommendation or prediction models. Is the data clean or is it biased. Do the entities represent
the general population of entities or only a subset? If you know how well your data is
representative and how to interprete the raw data as well as the modelled or predicted data, you
can apply data mining to almost anything.
At Microsoft, we had quite success modeling dependency graphs (e.g. social networks) and
organizational structure to predict and estimate their impact? Hoe should we restructure the
organization to gain goal X or how does collaboration or missing communication influence quality.
I guess the very same should hold for civil engineering as well.

Big data: a tremendous resource for the coal


industry
In an industry celebrated for extracting and moving enormous quantities of raw
material around the world, we have been less than enthusiastic in embracing the big
data we generate, although it has potential to be a tremendous resource for improved
health, safety and efficiency. So, how do we effectively mine these data?
Examine just the data generated over the course of a single day in an underground
mine: sensing of mine gases at critical locations; fan pressures and operating
characteristics; barometric pressures; temperature; dust and other particulate matter
(DPM) concentration and size fractions; conveyor belt monitoring; equipment location;
geophysical parameters; function of cutting and hauling equipment, from load to
maintenance indicators; tracking of supplies and people and this is hardly an
exhaustive list. The challenges in managing and utilising these data sets are daunting,
but the opportunities are limitless.
As an example, consider only those data sets that characterise the function of mine
ventilation systems and the health of the mine environment. These data are generally
logged and threshold levels, set by regulatory limit or other safety standard, are
generally set to trip an alarm to alert designated persons: for instance, a high carbon
monoxide (CO) reading on a conveyor belt or a high methane (CH4) reading in a
return air course would trigger notification of affected persons and investigation of the
source.
No doubt, these applications have prevented many accidents, but rarely are these data
utilised to their full capacity. Manual trending and study of these data sets is time
intensive and automatic trending of so many variables rarely allows for the emergence
of salient indicators. However, full application of data analytics and science to the
mining industry could provide early insight into the failure of systems or allow for early
identification of opportunities for improvement.

Considering the coal mine user of big data


An operation must first take a fairly sophisticated approach towards the collection and
storage of data:

Operations must implement robust and reliable systems for data collection and
storage. In other words, there must be a quantity of data and confidence in the
quality of these data.
Data must be synthesised. In complex systems, data are collected over many
domains. Data synthesis may include assimilating the data in time or space.
The way in which data are analysed and applied must be designed with many
factors in mind, with emphasis on the end user.

Analysis must allow for the emergence of the most important data, correlation and
indications of causality, as well as perhaps even predicative capabilities. For instance,
in examining the ventilation data referred to above, if increasing methane emissions
are observed over time, can they be correlated with increased production rate? With
dropping barometric pressure? And can causality be verified?
As large quantities of data for mining applications are examined, the user is one of the
most important considerations. With the advances that have already been realised in
underground wireless communication, the day when we can communicate data to
every person working in a mine has arrived, but the demands of the underground
environment certainly presents challenges. First, a handheld device for communication
that is easily portable, rugged, appropriate for use in potentially explosive
environments and allows for visual communication would be an achievement. Next,
using the appropriate techniques for imparting knowledge about the operation visually
and in relatively little time is critical consider the information people glean with only a
glance at a smart phone.
Imagine a longwall shearer operator arriving for a shift and accessing an interactive
screen that provides information about methane and dust levels across the face,
correlated with production and barometric pressure, as well as air provided to the face
and bleeder and gob indictors. Such data could be presented in a manner that allows
for rapid visual understanding. Underground, this operator has access to a rugged and
portable tablet with the same information updated in real time. So do other workers
and supervisors. The opportunities to enhance decision making with real implications
for health, safety and efficiency are evident.
Further, these concepts can be applied to other safety indicators, such as:

Monitoring of ground.
Maintenance, allowing for targeted and improved preventative maintenance.
Production, allowing for rapid identification of inefficiencies and bottlenecks with
opportunity for holistic improvement.
Quality, allowing for better analysis and assurance of products.

Empowering coal miners


Undeniably, the greatest resource in the world coal industry is its people. Everyone,
from the miners at the pit face to the CEO, is entrusted with decisions each day that
can impact the health and safety of coworkers and communities, as well as the viability
of the operation. Improved utilisation of our own big data provides the information
needed to make better informed decisions.