Está en la página 1de 5

Project: Forecasting Sales

Complete each section. When you are ready, save your file as a PDF document and submit it
here: https://classroom.udacity.com/nanodegrees/nd008/parts/edd0e8e8-158f-4044-9468-
3e08fd08cbf8/project

Step 1: Plan Your Analysis


Look at your data set and determine whether the data is appropriate to use time series models.
Determine which records should be held for validation later on (250 word limit).

Answer the following questions to help you plan out your analysis:
1. Does the dataset meet the criteria of a time series dataset? Make sure to explore all four
key characteristics of a time series data.
The data is a time series data because it has:
continuous time interval
sequential measurements across that interval : Awesome: Excellent!
equal spacing between every two consecutive measurements
each time unit within the time interval has at most one data point
: Required; Please note that we are forecasting for the
2. Which records should be used as the holdout sample? next 4 periods, therefore, we need 4 months as a
Because the manager want have a forecast of monthly sale data, it is better to keep the
most recent 12 monthly data as the hold out sample. holdout sample. Please make sure that these four
months are the last four months of the data set we
Step 2: Determine Trend, Seasonal, and Error components have - 2013-06 to 2013-09,

Graph the data set and decompose the time series into its three main components: trend,
seasonality, and error. (250 word limit)

Answer this question:

1. What are the trend, seasonality, and error of the time series? Show how you were able
to determine the components using time series plots. Include the graphs.
: Required: Please note that the seasonal portion
shows that the regularly occurring spike in sales each
year changes in magnitude, even so slightly rather
than being constant. In Alteryx, we will need to hover
The trend of the booking has a linear shape, there is seasonality increasing year after our mouse over the seasonal graph in Interface mode
year but the magnitude is not big, therefore, it is more of additive than multiplicative. The to be able to see that the seasonal numbers are
error term display a multiplicative trend as the peak increase quite fast. slightly increasing. This is important because:
- Having seasonality suggests that any ARIMA models
Step 3: Build your Models used for analysis will need seasonal differencing.
- The change in magnitude suggests that any ETS
Analyze your graphs and determine the appropriate measurements to apply to your ARIMA and models will use a multiplicative method in the
ETS models and describe the errors for both models. (500 word limit) seasonal component.

Answer these questions:


: Awesome: Our trend line is confirmed as upward
trending.
1. What are the model terms for ETS? Explain why you chose those terms.
a. Describe the in-sample errors. Use at least RMSE and MASE when examining
: Awesome: The error plot of the series presents a
results
fluctuations between large and smaller errors as the
The model term for ETS is trend additive and damping because in the picture above the
line start to flat off toward the right of the picture, error multiplicative because the peak time series goes on. Since the fluctuations are not
of the plot increase fast period after period, the season shows an increase in consistent in magnitude then we will apply error in a
magnitude, but the increase is slow; therefore, additive is more suitable => ETS (M,A,A) multiplicative manner for any ETS models.

The result bellow show the result from running the model. The data has the minimum : Awesome: The error and trend terms are correct - well
value of 51,000 and a mean of more than 276,000; comparing with the value of all error done!
of about 30,000, the model is acceptable. However, 30,000 is still a very big number, so
the model needed to be rechecked again by comparing with another model. AIC and BIC : : Required: Since the interpretation of the
value is also quite small. But still we need to compare it with another model to get a decomposition plot above is not accurate as I
more credible result. Besides, if we use the graph to forecast the nearest 12 month sale, mentioned above the terms for the EST model are not
the result follows the actual sale pattern quite well, but the amount is all smaller than the accurately identified. Specifically, the method for
actual value. seasonality is not correct.

The seasonality changes in magnitude each year so a


multiplicative method is necessary.

: Required: The in sample errors are incorrect since the


seasonality term is incorrect. And probably because 12
periods were used as a holdout sample instead of
4.Please correct the issues and re-calculate them.
: : Required: As it is said in the question here
pleaseinclude all of these plots:
- Time Series ACF and PACF
- Seasonal Difference ACF and PACF
- Seasonal First Difference ACF and PACF

2. What are the model terms for ARIMA? Explain why you chose those terms. Graph the In the order listed. These plots need to be used to
Auto-Correlation Function (ACF) and Partial Autocorrelation Function Plots (PACF) for support your choice of the terms of the ARIMA model.
the time series and seasonal component and use these graphs to justify choosing your
model terms.
a. Describe the in-sample errors. Use at least RMSE and MASE when examining Then after establishing the correct ARIMA model we
results should regraph ACF and PACF for both the Time Series
b. Regraph ACF and PACF for both the Time Series and Seasonal Difference and and Seasonal Difference and include these graphs in
include these graphs in your answer our
answer. The ACF and PACF results for the correct
ARIMA model should show no significantly correlated
lags suggesting no need for adding additional AR() or
MA() terms.

: Awesome: Nice work including the Seasonal First


Difference ACF and PACF plots.
After taking the seasonal difference and the difference of the season difference, the data
is stationery. The seasonal pattern in the series without taking difference has been
eliminated. Thus the data need first difference and 1 seasonal difference. The graph
after data has been touched shows that we only need MA (1) term as the ACF and
PACF all have negative value and only significant at the first lag. In conclusion, the : Suggestion: Correct just please also note that we
ARIMA model is (0,1,1)(0,1,0) have monthly data - that mean that m = 12. =>
ARIMA(0,1,1)(0,1,0)[12]

: Required: The in sample errors are incorrect since -


probably because 12 periods were used as a holdout
sample instead of 4.Please correct the issues and
re-calculate them.
The result of is quite good, as the model has the AIC and BIC criteria lower than the
ETS. But the RMSE is bigger.

The result from using ARIMA to forecast the holdout sample seems very promising, as
the predicted value follows closely the real ones.

Step 4: Forecast
Compare the in-sample error measurements to both models and compare error measurements
for the holdout sample in your forecast. Choose the best fitting model and forecast the next four
periods. (250 words limit)

Answer these questions.

1. Which model did you choose? Justify your answer by showing: in-sample error
measurements and forecast error measurements against the holdout sample.

Two tables bellows shows the forecast error from the ETS (the first table) and the
ARIMA (the second one). In the previous part, it has been shown that the ETS and
ARIMA performance in the training data is difficult to determine which one is better as
ETS has good performance in the RMSE while ARIMA is good in the AIC and BIC ratio.
However, in the holdout sample, the situation is different, ARIMA outperform in all ratio.
Therefore, the good performance of ETS might be caused by overfitting and we should : Awesome: ARIMA is the best performing model - well
choose the ARIMA model to forecast sale. done!

: Required: These accuracy measures are little bit off.


Make sure to apply of all the changes that I suggested
above and then re run the forecast error
measurements against the holdout sample. Also,
please note that again we need to include the holdout
samples as a comparison between the ETS and ARIMA
models.
2. What is the forecast for the next four periods? Graph the results using 95% and 80% : Awesome: Excellent! This is a surprise considering
confidence intervals. the issues that I mentioned above. This is the hardest
The forecasted result for the next four period is shown below part for students normally - nearly everybody gets
them wrong - so great job achieving the correct results!
:)

: Suggestion: It would be really nice if you can zoom in


the part that represents the next four periods from
October 2013 until January 2014.

Before you Submit

Please check your answers against the requirements of the project dictated by the rubric here.
Reviewers will use this rubric to grade your project.

También podría gustarte