Introduction to Time Series Analysis and Forecasting- II

Introduction to Time Series Analysis and Forecasting- II
3+

in this section we are going to discuss about the forecasting results of Prophet and LSTM broadly related to our Sales data set.

Note: This is a continuation of part 1 where we discussed about seasonality, trends and results regarding ARIMA. If you didn’t check that please go through that for better understanding of this part.

part 1:

PROPHET

Released by Facebook in 2017, forecasting tool Prophet is designed for analyzing time-series that display patterns on different time scales such as yearly, weekly and daily. It also has advanced capabilities for modeling the effects of holidays on a time-series and implementing custom changepoints. Therefore, we are using Prophet to get a model up and running.

“Prophet has been a key piece to improving Facebook’s ability to create a large number of trustworthy forecasts used for decision-making and even in product features.”

It is a decomposable time series model with three main model components: trend, seasonality, and holidays. They are combined in the following equation:

  • g(t): piecewise linear or logistic growth curve for modelling non-periodic changes in time series
  • s(t): periodic changes (e.g. weekly/yearly seasonality)
  • h(t): effects of holidays (user provided) with irregular schedules
  • εt: error term accounts for any unusual changes not accommodated by the model

Trend

Trend is smooth ,regular, long term movement of a statistical series : frequent changes either in absolute amount or in rates of increase or decrease are quite inconsisitent with the idea of secular trend i.e there is no periodicity.

SEASONAL VARIATION

A periodic movement is one which recurs with some degree of regularity within a definite time period.

HOLIDAYS

Holidays or events incur predictibility of a time series. As for example in New year the sale of clothes remain quite high as compared to other regular days and same for Diwali (in India)or other occasions.

ok. Let’s jump into the code:

###prophet

ts=df.groupby(["date_block_num"])["item_cnt_day"].sum()

ts.index=pd.date_range(start = '2013-01-01',end='2015-10-01', freq = 'MS')

ts=ts.reset_index()

ts.head()

ts.columns=['ds','y']

ts.head()

from fbprophet import Prophet

#prophet reqiures a pandas df at the below config # ( date column named as DS and the value column as Y)

model = Prophet( yearly_seasonality=True) 

#instantiate Prophet with only yearly seasonality as our data is monthly 

model.fit(ts)

future = model.make_future_dataframe(periods = 5, freq = 'MS') 

 # now lets make the forecasts

forecast = model.predict(future)

forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

Let’s plot:

model.plot(forecast)

Some limitations:

  1. Most of the cases Facebook doesn’t use Prophet for large time series. Usually things being forecasted are some sort of aggregate measures over some number of days.
  2. if you data is large enough (>5 M) we should estimate seasonalities & check where trend is headed . Sampling may be a reasonable thing to do.
  3. if you get an error like ds or y is not in df or something else just restart the kernel and run again.

Hey if you want to see forecast components, you can see them by prophet.plot_components method.

You will see yearly seasonality, weekly seasonality &b trend of the time series (& holidays also).

LSTM(Long Short Term Memory)

Sequence prediction problems have been around for a long time. They are considered as one of the hardest problems to solve in the data science industry. These include a wide range of problems; from predicting sales to finding patterns in stock markets’ data, from understanding movie plots to recognizing your way of speech, from language translations to predicting your next word on your iPhone’s keyboard.

Architecture of LSTM will not be discussed here broadly. It will be posted in a separate article of Deep Learning Series.

our approach:

Our features will be number of items sold in month from a shop excluding last month data because that will our labels, that we help our model learn to predict next sequence. And for testing will use number of items sold in month from a shop excluding first month like this dimension of our data remains same. Our model will predict the next sequence and that we will be our results.

Note: try it in a different way.

Let’s discuss how LSTM works and what do we really need?…..

step1: construct a pivot table

dataset = df.pivot_table(index = ['shop_id','item_id'],values = ['item_cnt_day'],columns = ['date_block_num'],fill_value = 0,aggfunc='sum')

step2:

X_train = np.expand_dims(dataset.values[:,:-1],axis = 2)

###the last column is our label

y_train = dataset.values[:,-1:]

###for test we keep all the columns execpt the first one

X_test = np.expand_dims(dataset.values[:,1:],axis = 2)

lets have a look on the shape

print(X_train.shape,y_train.shape,X_test.shape)

>>>(424124, 33, 1) (424124, 1) (424124, 33, 1)

y_test=dataset.values[:,:1]

print(y_test.shape)
>>(424124, 1)

Note: Why LSTM needs 3 dimensional input?

>>> LSTM layer is a recurrent layer, hence it expects 3 D input i.e it wants input dimension , time steps, batch_size.

>>> Empirical evidence shows that LSTM can learn upto 100 time steps ,so feeding larger sequences won’t give you better results.

>>> if your data is not 3 dimensional then reshape it by using ‘.reshape ‘ method.

ok let’s jump into the model building part.

from keras.models import Sequential
from keras.layers import LSTM,Dense,Dropout

my_model = Sequential()
my_model.add(LSTM(units = 64,input_shape = (33,1)))
my_model.add(Dropout(0.4))
my_model.add(Dense(1))

my_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 64)                16896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 16,961
Trainable params: 16,961
Non-trainable params: 0

my_model.compile(loss = 'mse',optimizer = 'adam', metrics = ['mean_squared_error'])

Note: if you want to calculate accuracy and loss then write metrics= [‘accuracy’]

####my_model.compile(loss = ‘mse’,optimizer = ‘adam’, metrics = [‘accuracy’])

history=my_model.fit(X_train,y_train,batch_size = 4096,epochs = 10)

y_pred=my_model.predict(X_test)

from sklearn.metrics import mean_squared_error
from numpy import sqrt
rmse = sqrt(mean_squared_error(y_test,y_pred))
print('Val RMSE: %.3f' % rmse)

>>>

Val RMSE: 1.595

Note: RMSE is square root of the variance of the residuals. It indicates the absolute fit of the model to the data – how close the observed data points are to the model’s predicted values.

i.e lower values of RMSE indicate better fit.

END Points:

>>> there are a number of issues that are important prior to any univariate analysis of time series:

  1. The order of integration of the series should be determined, this is often undertaken by DF or ADF(Augmented Dickey Fuller) test to assist in this process. For most of the time series first order difference is sufficient to make the time series stationary.
  2. LSTM works better if huge amount of data is available while ARIMA & Prophet are better for smaller data sets.
THE END: How Close Are We?” Part 1 - Pastor Raymond Woodward - YouTube

Thank You!!!!!

Souvik Manna
Souvik Manna

ISI Bangalore


[likebtn counter_type=”percent” bp_notify=”0″]
3+

Mathematica-City

Mathematica-city is an online Education forum for Science students run by Kounteyo, Shreyansh and Souvik. We aim to provide articles related to Actuarial Science, Data Science, Statistics, Mathematics and their applications using different Statistical Software. Feel free to reach out to us for any kind of discussion on any of the related topics,

One thought on “Introduction to Time Series Analysis and Forecasting- II

Leave a Reply

Your email address will not be published.