Time Series Analysis and Forecasting with R, Beginners' Lessons

Welcome back to my Blog post. In this post I am going to make a simple introduction to time series analysis with R. This introduction is useful as a baseline for a deep understanding of time series analysis, machine learning or deep learning in the context of data sciences or econometrics. So let's get started.

Definition

Time series are variables that are measured sequentially at the same periodic time interval. Time series are commonly used in Economics, Finance, Health, Environmental analysis, Gaming, Technology, and so on.

One of the importance of time series is to be able to forecast the analyzed variables. Those variables can represent the Key Performance Indicators (KPI) of the companies such as demand, sales, Return on Investment (ROI), weather variables. In the financial market, we can forecast the price of a specific stock or index in the market.

Create a time series with the ts() function in R

There are multiple ways to create a time series in R. In this case, we can use the ts() function. Here is how is presented the ts() function.

where :

1. data here is a vector or a matrix of the time series considered.

2. start is the first datetime of the series.

3. end is the last datetime of the series.

4. frequency represents the time interval of the series.

5. ts.eps is the time series comparison tolerance.

6. class is the class to be considered for the series. "ts" is used for simple series and "mts" for multiple time series.

7. names is a character vector of names for the series in a multiple series: defaults name is the column name of data.

Example

We want to create a time series of quarterly values from a vector of 32 values. Starting from 1959. Here is the code

Here is what we obtain from the code.

How to decompose a time series ?

Each value in a time series variable can be decomposed as a sum or product of three terms : The trend, seasonality and error term. If a time series is decomposed as the sum of 03 terms, it is called an additive series. Otherwise, it is called a multiplicative series.

Usually, when we analyze a time series, we are interested in the stationarity property of the series. Let us define the stationarity of a time series

When do we say a time series to be Stationarity ?

A time series is stationary if the following properties are verified.

1. The average value of the series does not variate over time.

2. The standard deviation of the series does not variate over time.

3. The cross-covariance between 2 values of the time series depends only on the lag in-between.

In other words, it means that there is no trend, deviation or seasonality effect in the time series.

Example of time series decomposition with Amazon stock.

To get the trend, seasonality and error term, it is possible to use the function "decompose()". To illustrate this, I use the AMAZON stock data (ticker symbol AMZN) downloaded from Yahoo Finance. The data go from 01/01/2000 to 12/31/2019 at the daily frequency.

Once the data are downloaded and saved in the local computer. I use the read.csv() function to read the file. Then I create my time series data called tsDATA with the ts() function. Then I can use the decompose function for my time series decomposition. Here is the code for the decomposition.

When I plot the decomposition results of the time series, I obtain the following result.

Calculate the lag of a time series

So far we have discussed about the time series decomposition. In Now, let's talk about the creation of the lag of a time series. Indeed, the lag of a time series is commonly used to identify the relationship between a time series and its historical pattern. To calculate the lag of a function, it is possible to use the "lag()" function. For example, if we want to calculate the AMAZON stock series lagged from 10 days, we use the following code.

One may like to identify the relation between the lag, lead, one step of the operation could be to create those variables and combine them all together in a single data frame. In the example below, I create a 3 days lag and 3 days lead variables and combine them all together. We need for instance to install the package named "DataCombine". Here is the code.

And we obtain the following data frame.

Autocorrelations and Partial autocorrelations analysis

Autocorrelations and partial autocorrelations analysis is one of the most popular ways to analyze the stationarity of a time series. Basically, it analyzes the cross-correlations between a time series and its lag or leads. Indeed, if a series is non-stationary, it helps us to identify order of the auto-correlation, meaning that by doing so we are able to have approximately, the number of lags of the time series that are cross-correlated. The functions "acf()" and "pacf()" are used to plot respectively the graphic of autocorrelations and partial autocorrelations. Here is the code.

And we obtain the following figure for ACF

and for PACF

As the only first bar of the PACF is out-of- bands, then we can say that the AMAZON time series is autocorrelated of order one.

Detrending a time series

Wen we notice that a time series displays a trend, one may like to remove the trend effect before running the econometric analysis. To remove the trend, one strategy could be to run a simple linear regression of the time series variable (AMAZON close price) on the time variable and get the residual of the regression as the detrended variable. Then, we use the following code.

Here is the plot of the detrended variable of AMAZON close price.

De-seasonalize a time series

To remove the seasons effect, we need first to decompose the time series and then take away the season variable from the original time series by doing a simple substraction. Here is the code

The de-seasonalized time series is presented in the following figure

Differentiating a time series variable

The operation of differentiating a time series is very used in time series analysis and machine learning or data science. In the context of time series analysis, it is used to avoid the stochastic trend effects. In R software, we use the "diff()" function for such purpose. Here is the code for the time series differentiation.

Hence, we obtain the following figure

So that's all for this post. There are many other functionalities that we can learn, but we prefer to prepare it for another post. I hope you liked it. If you liked the post, please share it with friends and your community of machine learning and data science.

Idriss TSAFACK, Ph.D.