The prediction interval provides a measure of uncertainty for the prediction of regression problems.

For example, a 95% prediction interval means 95 out of 100 times, and the true value will fall between the lower limit and the upper limit of the range. This is different from a simple point forecast that may represent the center of the uncertainty interval. There is no standard technique for calculating the prediction interval of deep learning neural networks for regression prediction modeling problems. However, a set of models can be used to estimate fast and dirty prediction intervals, which in turn provide the distribution of point predictions from which the interval can be calculated.

In this tutorial, you will discover how to calculate the prediction interval of a deep learning neural network. After completing this tutorial, you will know:

- The prediction interval provides a measure of uncertainty for regression prediction modeling problems.
- How to develop and evaluate a simple multilayer perceptron neural network on standard regression problems.
- How to use neural network model integration to calculate and report prediction intervals.

**Tutorial overview**

This tutorial is divided into three parts: They are:

- Prediction interval
- Recurrent neural network
- Neural network prediction interval

**Prediction interval**

Usually, the predictive model (ie, predictive value) used for regression problems makes point predictions. This means that they can predict a single value, but cannot provide any indication about the uncertainty of that prediction. By definition, predictions are estimates or approximations, and contain some uncertainty. The uncertainty comes from the error of the model itself and the noise in the input data. The model is an approximation of the relationship between input variables and output variables. The forecast interval is a quantification of forecast uncertainty. It provides the upper and lower probability limits for the estimation of outcome variables.

The prediction interval is the time interval most commonly used when making predictions or predictions in the regression model of the predicted quantity. The prediction interval revolves around the predictions made by the model and hopes to cover the range of real results. For more information on general prediction intervals, see the tutorial:

"The prediction interval of machine learning":

Now that we are familiar with the prediction interval, we can consider how to calculate the interval of the neural network. First define a regression problem and a neural network model to solve this problem.

**Recurrent neural network**

In this section, we will define the regression predictive modeling problem and the neural network model to solve the problem. 1. let us introduce a standard regression data set. We will use the housing data set. The housing data set is a standard machine learning data set, including 506 rows of data, which contains 13 digital input variables and a digital target variable.

Using a test tool with three replicates of repeated stratification 10-fold cross-validation, a naive model can achieve a mean absolute error (MAE) of about 6.6. On the same test tool at about 1.9, the highest performance model can achieve MAE. This provides a bound on expected performance for this data set. This data set includes predicting housing prices based on detailed information about the suburbs of Boston, the United States.

Housing data set (housing.csv):

House description (house name):

There is no need to download the dataset; as part of our working example, we will download it automatically.

The following example downloads and loads the dataset as a Pandas DataFrame and outlines the shape of the dataset and the first five rows of the data.

```
# load and summarize the housing dataset
from pandas import read_csv
from matplotlib import pyplot
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# summarize shape
print (dataframe.shape)
# summarize first few lines
Print (dataframe.head ())
copying the code
```

The running example will confirm 506 rows of data and 13 input variables and a numerical target variable (14 in total). We can also see that all input variables are numbers.

```
( 506 , 14 )
0 1 2 3 4 5 ... 8 9 10 11 12 13
0 0.00632 18.0 2.31 0 0.538 6.575 ... 1 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 ... 2 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 ... 2 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 ... 3 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 ... 3 222.0 18.7 396.90 5.33 36.2
[ 5 rows x 14 columns]
Copy code
```

Next, we can prepare the data set for modeling. 1. the data set can be split into input and output columns, and then the rows can be split into training and test data sets. In this case, we will use about 67% of the rows to train the model, while the remaining 33% of the rows are used to estimate the performance of the model.

```
# split into input and output values
X, y = values[:,: -1 ], values[:, -1 ]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 )
copying the code
```

You can learn more about training test splits in this tutorial: Train test splits to evaluate machine learning algorithms. Then, we scale all input columns (variables) to the 0-1 range, which is called data normalization, This is a good habit when using neural network models.

# scale input data scaler = MinMaxScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) Copy code

You can learn more about using MinMaxScaler to standardize input data in this tutorial: "How to use StandardScaler and MinMaxScaler conversion in Python":

A complete example of the data prepared for modeling is listed below.

```
# load and prepare the dataset for modeling
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
values = dataframe.values
# split into input and output values
X, y = values[:,: -1 ], values[:, -1 ]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 )
# scale input data
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
# summarize
Print (X_train.shape, X_test.shape, y_train.shape, y_test.shape)
copy the code
```

The running example loads the data set as before, then splits the columns into input and output elements, splits the rows into training and test sets, and finally scales all input variables to the range of [0,1]. The shape of the training image and the test set are printed, showing that we have 339 rows for training the model and 167 rows for evaluating the model.

```
( 339 , 13 ) ( 167 , 13 ) ( 339 ,) ( 167 ,)
copy the code
```

Next, we can define, train and evaluate a multi-layer perceptron (MLP) model in the data set. We will define a simple model with two hidden layers and an output layer for predicting values. We will use the ReLU activation function and "he" weight initialization, which is a good habit. After some trial and error, the number of nodes in each hidden layer was selected.

```
# define neural network model
features = X_train.shape[ 1 ]
model = Sequential()
model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))
model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))
model.add (the Dense ( . 1 ))
to copy the code
```

We will use an efficient Adam's version of Stochastic Gradient Descent method with close to the default learning rate and momentum value, and use the mean square error (MSE) loss function (standard for regression predictive modeling problems) to fit the model.

```
# compile the model and specify loss and optimizer
opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )
model.compile (opt = Optimizer, Loss = 'MSE' )
copying the code
```

You can learn more about Adam's optimization algorithm in this tutorial:

"Write code from scratch Adam gradient descent optimization"

Then, the model will fit 300 epochs with a batch size of 16 samples. After some trial and error, this configuration was chosen.

```
# fit the model on the training dataset
model.fit (X_train, y_train, verbose = 2 , epochs = 300 , the batch_size = 16 )
copying the code
```

You can learn more about batches and epochs in this tutorial:

"Differences between batches and periods in neural networks"

Finally, the model can be used to make predictions on the test data set. We can evaluate the prediction by comparing the predicted value with the expected value in the test set, and calculate the mean absolute error (MAE), which is a useful measure of model performance .

```
# make predictions on the test set
yhat = model.predict(X_test, verbose = 0 )
# calculate the average error in the predictions
mae = mean_absolute_error(y_test, yhat)
print ( 'MAE: .3f%' % Mae)
copying the code
```

The complete example is as follows:

```
# train and evaluate a multilayer perceptron neural network on the housing regression dataset
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
values = dataframe.values
# split into input and output values
X, y = values[:,: -1 ], values[:, -1 ]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 )
# scale input data
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
# define neural network model
features = X_train.shape[ 1 ]
model = Sequential()
model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))
model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))
model.add(Dense( 1 ))
# compile the model and specify loss and optimizer
opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )
model.compile(optimizer=opt, loss= 'mse' )
# fit the model on the training dataset
model.fit(X_train, y_train, verbose = 2 , epochs = 300 , batch_size = 16 )
# make predictions on the test set
yhat = model.predict(X_test, verbose = 0 )
# calculate the average error in the predictions
mae = mean_absolute_error(y_test, yhat)
Print ( 'MAE: .3f%' % Mae)
copying the code
```

Running the example will load and prepare the data set, define and fit the MLP model on the training data set, and evaluate its performance on the test set.

Note: Due to the randomness of the algorithm or evaluation procedure, or the difference in numerical precision, your results may be different. Consider running the example several times and comparing the average results.

In this case, we can see that the average absolute error achieved by the model is about 2.3, which is better than the naive model and close to the best model.

There is no doubt that by further adjusting the model, we can achieve near-optimal performance, but this is enough for us to study the prediction interval.

```
Epoch 296/300
22 is/22 is - 0 S - Loss: 7.1741
Epoch 297/300
22 is/22 is - 0 S - Loss: 6.8044
Epoch 298/300
22 is/22 is - 0 S - Loss: 6.8623
Epoch 299/300
22 is/22 is - 0 S - Loss: 7.7010
Epoch 300/300
22 is/22 is - 0 S - Loss: 6.5374
MAE: 2.300
copy the code
```

Next, let's see how to calculate the prediction interval using the MLP model on the housing dataset.

**Neural network prediction interval**

In this section, we will use the regression problem and model developed in the previous section to develop the prediction interval.

Compared with linear methods like linear regression (prediction interval calculation is very simple), the prediction interval calculation of nonlinear regression algorithms like neural networks is challenging. There is no standard technology. There are many ways to calculate the effective prediction interval for a neural network model. I suggest some papers listed in the "More Reading" section to learn more.

In this tutorial, we will use a very simple method, which has a lot of room for expansion. I call it "fast and dirty" because it is fast and easy to calculate, but it has certain limitations. It involves fitting multiple final models (for example, 10 to 30). The distribution of the point predictions from the set members is then used to calculate the point prediction and the prediction interval.

For example, the point prediction can be taken as the average of the point predictions from the set members, and the 95% prediction interval can be taken as the 1.96 standard deviation from the average. This is a simple Gaussian prediction interval, although alternative methods can be used, such as minimum and maximum point predictions. Alternatively, the bootstrap method can be used to train each ensemble member on different bootstrap samples, and the 2.5th percentile and 97.5th percentile of the point prediction can be used as the prediction interval.

For more information about the bootstrap method, see the tutorial:

"A brief introduction to the Bootstrap method"

These extensions are reserved as exercises; we will stick to simple Gaussian prediction intervals.

Suppose that the training data set defined in the previous section is the entire data set, and we are training one or more final models on this entire data set. Then, we can use the prediction interval on the test set to make predictions and evaluate the effectiveness of the interval in the future.

We can simplify the code by dividing the elements developed in the previous section into functions. 1. let us define a function to load and prepare the regression data set defined by the URL.

```
# load and prepare the dataset
def load_dataset(url):
dataframe = read_csv(url, header=None)
values = dataframe.values
# split into input and output values
X, y = values[:,: -1 ], values[:, -1 ]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 )
# scale input data
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
return X_train, X_test, y_train, android.permission.FACTOR.
Copy the code
```

Next, we can define a function that will define and train the MLP model given the training data set, and then return a fitted model suitable for prediction.

```
# define and fit the model
def fit_model(X_train, y_train):
# define neural network model
features = X_train.shape[ 1 ]
model = Sequential()
model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))
model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))
model.add(Dense( 1 ))
# compile the model and specify loss and optimizer
opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )
model.compile(optimizer=opt, loss= 'mse' )
# fit the model on the training dataset
model.fit (X_train, y_train, verbose = 0 , = epochs 300 , the batch_size = 16 )
return Model
duplicated code
```

We need multiple models to make point predictions. These models will define the distribution of point predictions from which the interval can be estimated.

Therefore, we need to fit multiple models on the training data set. Each model must be different in order to make different predictions. This can be achieved when the training MLP model has randomness, random initial weights, and the use of stochastic gradient descent optimization algorithms. The more models there are, the better point prediction will estimate the function of the model. I recommend using at least 10 models, and more than 30 models may not bring much benefit. The following function fits the whole model and stores it in the returned list. Out of interest, each fitted model was also evaluated on the test set, and the test set was reported after each model was fitted. We hope that the estimated performance of each model on the retention test set will be slightly different, and the reported scores will help us confirm this expectation.

```
# fit an ensemble of models
def fit_ensemble(n_members, X_train, X_test, y_train, y_test):
ensemble = list()
for i in range (n_members):
# define and fit the model on the training set
model = fit_model(X_train, y_train)
# evaluate model on the test set
yhat = model.predict(X_test, verbose = 0 )
mae = mean_absolute_error(y_test, yhat)
print ( '>%d, MAE: %.3f' % (i+ 1 , mae))
# store the model
Ensemble. the append (Model)
return Ensemble
duplicated code
```

Finally, we can use a well-trained collection of models to make point predictions, which can be summarized as a prediction interval.

The following function achieves this. 1. each model performs point prediction on the input data, then calculates the 95% prediction interval, and returns the lower limit, average, and upper limit of the interval.

This function is designed to take a single line as input, but can be easily adapted to multiple lines.

```
# make predictions with the ensemble and calculate a prediction interval
def predict_with_pi(ensemble, X):
# make predictions
yhat = [model.predict(X, verbose= 0 ) for model in ensemble]
yhat = asarray(yhat)
# calculate 95 % gaussian prediction interval
interval = 1.96 * yhat.std()
lower, upper = yhat.mean()-interval, yhat.mean() + interval
return Lower, yhat.mean (), Upper
copy the code
```

Finally, we can call these functions. 1. load and prepare the data set, and then define and fit the set.

```
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
X_train, X_test, y_train, y_test = load_dataset(url)
# fit ensemble
n_members = 30
ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)
Copy code
```

Then, we can use a row of data in the test set and make predictions at the prediction interval, and then report the results.

We also report the expected expected value, which will be covered within the forecast interval (probably close to 95% of the time; this is not entirely accurate, but a rough approximation).

```
# make predictions with prediction interval
newX = asarray([X_test[ 0 , :]])
lower, mean, upper = predict_with_pi(ensemble, newX)
print ( 'Point prediction: %.3f' % mean)
print ( '95%% prediction interval: [%.3f, %.3f]' % (lower, upper))
print ( 'True value: %.3f' % y_test[ 0 ])
Copy code
```

In summary, the following is a complete example of using a multilayer perceptron neural network to make predictions at prediction intervals.

```
# prediction interval for mlps on the housing regression dataset
from numpy import asarray
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# load and prepare the dataset
def load_dataset(url):
dataframe = read_csv(url, header=None)
values = dataframe.values
# split into input and output values
X, y = values[:,: -1 ], values[:, -1 ]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.67 , random_state = 1 )
# scale input data
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
return X_train, X_test, y_train, y_test
# define and fit the model
def fit_model(X_train, y_train):
# define neural network model
features = X_train.shape[ 1 ]
model = Sequential()
model.add(Dense( 20 , kernel_initializer = 'he_normal' , activation = 'relu' , input_dim=features))
model.add(Dense( 5 , kernel_initializer = 'he_normal' , activation = 'relu' ))
model.add(Dense( 1 ))
# compile the model and specify loss and optimizer
opt = Adam(learning_rate = 0.01 , beta_1 = 0.85 , beta_2 = 0.999 )
model.compile(optimizer=opt, loss= 'mse' )
# fit the model on the training dataset
model.fit(X_train, y_train, verbose = 0 , epochs = 300 , batch_size = 16 )
return model
# fit an ensemble of models
def fit_ensemble(n_members, X_train, X_test, y_train, y_test):
ensemble = list()
for i in range (n_members):
# define and fit the model on the training set
model = fit_model(X_train, y_train)
# evaluate model on the test set
yhat = model.predict(X_test, verbose = 0 )
mae = mean_absolute_error(y_test, yhat)
print ( '>%d, MAE: %.3f' % (i+ 1 , mae))
# store the model
ensemble. append (model)
return ensemble
# make predictions with the ensemble and calculate a prediction interval
def predict_with_pi(ensemble, X):
# make predictions
yhat = [model.predict(X, verbose= 0 ) for model in ensemble]
yhat = asarray(yhat)
# calculate 95 % gaussian prediction interval
interval = 1.96 * yhat.std()
lower, upper = yhat.mean()-interval, yhat.mean() + interval
return lower, yhat.mean(), upper
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
X_train, X_test, y_train, y_test = load_dataset(url)
# fit ensemble
n_members = 30
ensemble = fit_ensemble(n_members, X_train, X_test, y_train, y_test)
# make predictions with prediction interval
newX = asarray([X_test[ 0 , :]])
lower, mean, upper = predict_with_pi(ensemble, newX)
print ( 'Point prediction: %.3f' % mean)
print ( '95%% prediction interval: [%.3f, %.3f]' % (lower, upper))
print ( 'True value: %.3f' % y_test[ 0 ])
Copy code
```

The running example is adapted to each set member in turn, and its estimated performance is reported on the reserved test set; finally, a prediction with a prediction interval is made and predicted.

Note: Due to the randomness of the algorithm or evaluation procedure, or the difference in numerical precision, your results may be different. Consider running the example several times and comparing the average results.

In this case, we can see that the performance of each model is slightly different, which confirms that our expectations of the model are indeed different.

Finally, we can see that the ensemble made a prediction of about 30.5 points with a 95% prediction interval [26.287, 34.822]. We can also see that the true value is 28.2, and the interval does capture this value, which is very good.

```
> 1 , MAE: 2.259
> 2 , MAE: 2.144
> 3 , MAE: 2.732
> 4 , MAE: 2.628
> 5 , MAE: 2.483
> 6 , MAE: 2.551
> 7 , MAE: 2.505
> 8 , MAE: 2.299
> 9 , MAE: 2.706
> 10 , MAE: 2.145
> 11 , MAE: 2.765
> 12 , MAE: 3.244
> 13 , MAE: 2.385
> 14 , MAE: 2.592
> 15 , MAE: 2.418
> 16 , MAE: 2.493
> 17 , MAE: 2.367
> 18 , MAE: 2.569
> 19 , MAE: 2.664
> 20 , MAE: 2.233
> 21 , MAE: 2.228
> 22 is , MAE: 2.646
> 23 is , MAE: 2.641
> 24 , MAE: 2.492
> 25 , MAE: 2.558
>26 , MAE: 2.416
> 27 , MAE: 2.328
> 28 , MAE: 2.383
> 29 , MAE: 2.215
> 30 , MAE: 2.408
Point prediction: 30.555
95 % prediction interval: [ 26.287 , 34.822 ]
Value True: 28.200
Copy the code
```

As mentioned above, this is a fast and dirty technique for neural network prediction with prediction intervals. There are some simple extensions, such as applying guided methods to point predictions that may be more reliable, and the more advanced techniques described in some of the papers I suggest you explore below.

Author: Yishui Hancheng, CSDN blog expert, personal research direction: machine learning, deep learning, NLP, CV

Blog: yishuihancheng.blog.csdn.net