Predictive Maintenance by using R Statistical Language for Predictive Analytics
Posted on 26. Kolovoz 2013.
Introduction Predictive Analytics consists of the data processing techniques focusing in solving the problem of predicting future outcome based on analyzing previous collected data. Organizations are increasingly adopting predictive analytics, and adopting these predictive analytics more broadly. Many are now using dozens or even thousands of predictive analytic models. These models are increasingly used in realtime decision ma!ing and in operational, production systems. Models are used to improve customer treatment by selecting the ne"t best action to develop a customer, to ma!e loan or credit pricing decisions that reflect the future ris! of a transaction, to predict the li!elihood of equipment failure to drive proactive maintenance decisions, or to detect potentially fraudulent transactions so they can be routed out of the system before they hit the bottom line. #"amples li!e these deliver high $O% from analytics. Predictive maintenance and quality Predictive maintenance is &ust what its name implies' Maintaining components or assets, large or small, according to factbased e"pectations for when they will fail or require service. These facts can include' $ealtime device status' (ow is the part performing right now) (istorical device data' (ow has the part performed in the past) *ata for similar devices' (ow have other, similar parts performed) Maintenance records' +hen was the part last serviced or replaced) Maintenance schedules' +hat does the manufacturer recommend) Of course, all of this Big Data is meaningless without analysis. There are hidden patterns lurking within these facts and figures. Decoding these patterns is what powers predictive maintenance and separates it from more traditional, reactionary approaches to equipment repair and replacement. Traditional approaches to maintenance Predictive maintenance differs considerably from the traditional approaches to determining when to service or replace equipment. ,or years, companies have !ept their production lines running through a combination of these maintenance methods' eactive' -ervice or replace equipment after it fails Preventive' -ervice or replace equipment according to the manufacturer.s suggested schedule, or the amount of time it has been in service, or based on operational observations !ondition"#ased' -ervice or replace equipment based on monitoring performed to assess its current condition The problem with these oldschool approaches is their high cost. +aiting until a component fails means lost production time and revenue. %nperson inspections are e"pensive and can lead to replacing parts unnecessarily, based only on the inspector.s best guess. ,ollowing the manufacturer.s recommended maintenance schedule saves on inspection costs but often results in replacing parts that are still functioning well and could continue to do so. Tmilinovic's Blog One solution to decrease the operational cost and to increase the manufacturing system availability /,igure 01 is to manage continuously all maintenance activities and to control the degradation to move to predictive maintenance. Figure 1 : Decreasing of failure rate through predictive maintenance *evice events are supplied to the solution either in real time or as a batch and are transformed into the format required by the solution. The information in the events is recorded in the analytic data store along with aggregated !ey performance indicators /2P%s1 and profiles. The 2P%s are accumulated over time and show trends. The profiles indicate the current state of the device and can include statistical calculations of variation. ,or e"ample, events containing the temperature and operating load of the transformer can be aggregated as a 2P% of the average temperature and load per day. The operating load can also be aggregated as a profile to record the most recent load and the variability of the load over time. The information in the analytic data store is used to perform predictive scoring, a process that uses a mathematical model to put a numerical value on the li!elihood that a device or component failure will occur. +e than use a predefined set of rules to determine the appropriate actions to ta!e in response to various scores. ,or e"ample, if a score indicates that the probability of a transformer failure is less than 3.4 /4351, the rules may call for no immediate action. %f the score rises above 3.6 /6351, the rules may trigger a request to have a physical inspection performed. -coring is a !ey part of predictive maintenance and involves the use of predictive models that use historical data to determine the probability of certain future outcomes. ,or e"ample, a model could be created based on historical data regarding transformer temperature, current load, and occurrences of failure. $tatistical language for Predictive %nalytics 7aptured data is continuously scored using predictive analytics software. Predictive analytic models mine the data and correlate past failures using multivariate analysis. The models can mine all the variables and conditions that contributed to past failures in order to predict future failures. %ncoming data are then run through the model and asset health scores are generated on a real time basis. The processing cycle typically involves two phases of processing' 0. Training phase' 8earn a model from training data 9. Predicting phase' *eploy the model to production and use that to predict the un!nown or future outcome $ is an open source language:environment that is governed by ;P89. Predictive Analytics is tightly integrated with the algorithms and statistical libraries available in $. Oracle has it.s own version of $ called Oracle $ #nterprise for better customization to analytics using Oracle databases. -A- %nstitute had made connectivity from -A-:%M8 and <MP products some time bac!. %=M >s acquired analytics software -P-- had been one of the first softwares to wor! with $. $ provides a wide variety of statistical /linear and nonlinear modelling, classical statistical tests, timeseries analysis, classification, clustering, ?1 and graphical techniques, and is highly e"tensible. @sing $ for predictive analytics is a lowcost and fle"ible solution, but does require a basic !nowledge of statistics and mathematics. $ is a very powerful language for a number of reasons. (owever, the main feature is vector processing Athe ability to perform operations on entire arrays of numbers without e"plicitly writing iteration loops. Another important feature is that statisticians, engineers and scientists can improve the software.s code or write variations for specific tas!s. Pac!ages written for $ add advanced algorithms, colored and te"tured graphs and mining techniques to dig deeper into databases. !hoose the right model An important phase to consider is the actual analysis phase. +hen choosing a certain type of modeling, it is !ey to consider whether the main tas! is to provide a result that is as significant as possible or one that also needs to be presented to business users or engineers. %n many cases, decision trees have proven to be a very good approach for classification analysis cases. %n particular, the option to build the trees manually and, in so doing, being able to include domain !nowhow, is very powerful and well received by many customers. Also ma!e sure that you create a holdout sample of your data to test the models on their stability and predictive power on un!nown data. Otherwise you might end up creating a result that wor!s for the current data but will not wor! in other data in the future. This phenomenon is called overfitting. ,our predictive models are used' (ealth -core /binary logit model1 8ifespan Analysis /7o" $egression model1 $andom ,orests 7A$T /7lassification and $egression Trees1 Time series models &ealth $core '#inary logit model( The (ealth -core model is based on the linear regression model and measures the li!elihood that an asset or process will fail. The model uses historical defect data, operational information, and environmental data to determine the current operational health of an asset, and continuously monitors the asset to predict potential faults in real time. The resulting health score value, typically referred to simply as the (ealth -core, can also be used to predict the future health of the asset. The (ealth -core is presented as a number between 3 and 0. The higher the number, the healthier the asset. The overall (ealth -core for an entire manufacturing site represents the average score for each individual asset at a site. %f the input data model structure is modified, the health score model must be retrained on the new data. A wellestablished statistical method for predicting binomial outcomes is required to predict the health score value, and the solution uses a binomial logistic algorithm for this purpose. %n the binomial or binary logistic regression, the outcome can have only two possible types of values /e.g. BCesD or BEoD, B-uccessD or B,ailureD1. Multinomial logistic refers to cases where the outcome can have three or more possible types of values /e.g., BgoodD vs. Bvery goodD vs. BbestD 1. ;enerally outcome is coded as B3F and B0F in binary logistic regression. This !ind of algorithm is limited to models where the target field is a flag or binary field. The algorithm provides enhanced statistical output when compared to a multinomial algorithm and is less susceptible to problems when the number of table cells /unique combinations of predictor values1 is large relative to the number of records. $ provides comprehensive support for multiple linear regression. To fit logistic regression model, glm/1 function is used in $ which is similar to lm/1 or Blinear modelD function, but glm/1 includes additional parameters. The format is glm(Y~X1+X2+X3, famil!"inomial(lin#!$logit$%, data!mdata% (ere, C is dependent variable and G0, G9 and GH are independent variables. )ifespan %nalysis '!o* egression model( The 8ifespan Analysis model estimates a device.s remaining lifespan when functioning in a realworld scenario. *epending on the device, lifespan can be measured in hours, miles, stress cycles, or any other metric. *ata on the functional condition of the device is collected from laboratory e"periments. The 8ifespan Analysis model analyzes timetofailure event data. 8ifespan analysis is an offline, bac!end process and can be performed at regular intervals or on demand. The model is based on the !o* regression model. %n many cases where the time to a certain event /such as a failure1 can be predicted, the 7o" $egression technique is particularly wellsuited. This technique was originally developed to predict the life e"pectancy of cancer patents but it can be used &ust as well for technical analysis. 7o" $egression can also ta!e potential influence factors into account and finetune its failure estimates accordingly. The shape of the survival function and the regression coefficients for the predictors are estimated from observed sub&ectsI the model can then be applied to new cases that have measurements for the predictor variables. ,or 7o" $egression analysis we can use $ pac!age named survival /http'::cran.r pro&ect.org:web:pac!ages:survival:1 andom +orests !%T '!lassification and egression Trees( 7A$T models offer an intuitive overview of a multivariate data set and are suitable for dealing with comple" processes and nonlinear relationships. They are also able to recognize the parameters that are most important to a given regression problem. (owever, they suffer from high prediction variance. Therefore, for prediction purposes we use a method that utilizes an ensemble of 7A$T models called andom +orests. The aggregation of a large number of different single models usually offers improved prediction accuracy. Aggregating the results of single tree models reduces variance and produces more stable models. ,urthermore the method does not overfit due to the law of large numbers. $egression tree model is constructed by using binary recursive partitioning routines as implemented in the $ pac!age rpart /http'::cran.rpro&ect.org:web:pac!ages :rpart:inde".html1 and plotted using routines from the $ pac!age partykit /http'::cran.rpro&ect.org: web:pac!ages:party!it:inde".html1. The methodology allows a transition from a timebased to a conditionbased maintenance, a reduction of problem comple"ity and it offers high predictive performance. As the $andom ,orest approach is free of parametric or distributional assumptions, the method can be applied to a wide range of predictive maintenance problems. This leads to a reduction of tool downtime, maintenance and manpower costs and improves competitiveness in the semiconductor industry. Time series models Time series models are used for predicting or forecasting the future behavior of variables. These models account for the fact that data points ta!en over time may have an internal structure /such as autocorrelation, trend or seasonal variation1 that should be accounted for. As a result standard regression techniques cannot be applied to time series data and methodology has been developed to decompose the trend, seasonal and cyclical component of the series. Modeling the dynamic path of a variable can improve forecasts since the predictable component of the series can be pro&ected into the future. Time series models estimate difference equations containing stochastic components. Two commonly used forms of these models are autoregressive models /A$1 and moving average /MA1 models. The =o"<en!ins methodology /0J4K1 developed by ;eorge =o" and ;.M. <en!ins combines the A$ and MA models to produce the A$MA /autoregressive moving average1 model which is the cornerstone of stationary time series analysis. A$%MA /autoregressive integrated moving average models1 on the other hand are used to describe nonstationary time series. %n recent years time series models have become more sophisticated and attempt to model conditional heteros!edasticity with models such as A$7( /autoregressive conditional heteros!edasticity1 and ;A$7( /generalized autoregressive conditional heteros!edasticity1 models frequently used for financial time series. %n addition time series models are also used to understand interrelationships among economic variables represented by systems of equations using LA$ /vector autoregression1 and structural LA$ models. +e are using $ forecast pac!age /http'::cran.rpro&ect.org:web:pac!ages:forecast:1. Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu. Svia mi se Budite prvi kome se ovo svia.
elated Operationalizing Analytics -inclair MG -pectrum turns H3 Predictive Analytics and $, Part 0