21  Basics of Neural Network

21.1 Overview

Neural networks are a class of machine learning algorithms that are inspired by the structure and function of the human brain. Like random forests, they are used for both classification and regression tasks. However, instead of using a collection of decision trees, neural networks consist of layers of interconnected nodes (or neurons) that can process and learn from complex data.

It’s possible to use gradient boosting with other types of models, including neural networks. In this case, instead of combining decision trees, the algorithm combines multiple neural networks to create a powerful ensemble model. The main idea remains the same: iteratively train models on the residual errors of the previous models to minimise a loss function.

The basic building block of a neural network is the perceptron, which takes in a set of inputs, applies weights to them, and produces an output. These outputs are then passed through activation functions that determine whether the neuron will fire or not. By stacking layers of perceptrons and activation functions, a neural network can learn to model highly non-linear relationships in the data.

The following R code demonstrates creating and visualising a feedforward neural network using the nnet and deepnet packages. It uses the iris dataset to predict species based on sepal and petal dimensions. The numeric features are normalised, and the target variable is converted to a factor. A neural network with one hidden layer containing 5 neurons is defined and trained using the logistic activation function and a maximum of 100,000 iterations. Finally, the network architecture is visualised using the plotnet function from the deepnet package.

library(nnet)
library(deepnet)
library(NeuralNetTools)

# Load data
data(iris)

# Normalize the numeric features
normalize <- function(x) {
  return((x - min(x)) / (max(x) - min(x)))
}

iris[,1:4] <- lapply(iris[,1:4], normalize)

# Convert the target variable into a factor
iris$Species <- as.factor(iris$Species)

set.seed(42)

# Define the neural network architecture
nn <- nnet(
  Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
  data = iris,
  size = 5,
  act.fct = "logistic",
  linout = FALSE,
  maxit = 1e+05
)
#> # weights:  43
#> initial  value 180.827907 
#> iter  10 value 9.913449
#> iter  20 value 6.015153
#> iter  30 value 5.921213
#> iter  40 value 5.553101
#> iter  50 value 4.239151
#> iter  60 value 1.156178
#> iter  70 value 0.296792
#> iter  80 value 0.012652
#> iter  90 value 0.006984
#> iter 100 value 0.000851
#> iter 110 value 0.000175
#> final  value 0.000060 
#> converged

# Visualise
plotnet(nn)

The example neural network only has 1 hidden layer, however, it can have many hidden layers, which allows them to capture highly complex patterns and relationships in the data. However, this complexity can also lead to overfitting, where the model fits the training data too closely and fails to generalise to new data. To avoid overfitting, techniques like regularisation and early stopping can be used.

In the next example, we’ll demonstrate L2 regularisation in a neural network using the keras package in R. We will continue using the iris dataset for this illustration. Our goal is to predict the species of iris flowers based on their sepal and petal dimensions.

#Install and load the required libraries:
library(keras)
#> 
#> Attaching package: 'keras'
#> The following object is masked _by_ '.GlobalEnv':
#> 
#>     normalize

#Load the iris dataset and pre-process it:
data(iris)

# Normalize the numeric features
normalize <- function(x) {
  return((x - min(x)) / (max(x) - min(x)))
}

iris[,1:4] <- lapply(iris[,1:4], normalize)

# Convert the target variable into a one-hot encoded matrix
iris$Species <- as.factor(iris$Species)
y <- to_categorical(as.numeric(iris$Species) - 1)
X <- as.matrix(iris[, 1:4])

# Set the seed for reproducibility
set.seed(42)

# Define the L2 regularization term
l2_regularizer <- regularizer_l2(l = 0.01)

# Define the neural network architecture
# 
model <- keras_model_sequential() %>%
  layer_dense(units = 5, activation = "relu", input_shape = ncol(X),
              kernel_regularizer = l2_regularizer) %>%
  layer_dense(units = 3, activation = "softmax", kernel_regularizer = l2_regularizer)



# Compile the model
model %>% compile(
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "accuracy"
)

# Train the model
history <- model %>% fit(
  X, y,
  epochs = 100,
  batch_size = 16,
  validation_split = 0.2
)
#> Epoch 1/100
#> 8/8 - 1s - loss: 1.2971 - accuracy: 0.1667 - val_loss: 0.5406 - val_accuracy: 1.0000 - 903ms/epoch - 113ms/step
#> Epoch 2/100
#> 8/8 - 0s - loss: 1.2692 - accuracy: 0.1667 - val_loss: 0.5738 - val_accuracy: 1.0000 - 76ms/epoch - 10ms/step
#> Epoch 3/100
#> 8/8 - 0s - loss: 1.2405 - accuracy: 0.1667 - val_loss: 0.6075 - val_accuracy: 1.0000 - 49ms/epoch - 6ms/step
#> Epoch 4/100
#> 8/8 - 0s - loss: 1.2158 - accuracy: 0.1667 - val_loss: 0.6421 - val_accuracy: 1.0000 - 49ms/epoch - 6ms/step
#> Epoch 5/100
#> 8/8 - 0s - loss: 1.1907 - accuracy: 0.1667 - val_loss: 0.6759 - val_accuracy: 1.0000 - 70ms/epoch - 9ms/step
#> Epoch 6/100
#> 8/8 - 0s - loss: 1.1682 - accuracy: 0.1667 - val_loss: 0.7108 - val_accuracy: 1.0000 - 62ms/epoch - 8ms/step
#> Epoch 7/100
#> 8/8 - 0s - loss: 1.1476 - accuracy: 0.1750 - val_loss: 0.7468 - val_accuracy: 1.0000 - 64ms/epoch - 8ms/step
#> Epoch 8/100
#> 8/8 - 0s - loss: 1.1260 - accuracy: 0.2000 - val_loss: 0.7790 - val_accuracy: 1.0000 - 51ms/epoch - 6ms/step
#> Epoch 9/100
#> 8/8 - 0s - loss: 1.1073 - accuracy: 0.2333 - val_loss: 0.8129 - val_accuracy: 1.0000 - 41ms/epoch - 5ms/step
#> Epoch 10/100
#> 8/8 - 0s - loss: 1.0889 - accuracy: 0.3167 - val_loss: 0.8452 - val_accuracy: 1.0000 - 39ms/epoch - 5ms/step
#> Epoch 11/100
#> 8/8 - 0s - loss: 1.0734 - accuracy: 0.4417 - val_loss: 0.8795 - val_accuracy: 1.0000 - 40ms/epoch - 5ms/step
#> Epoch 12/100
#> 8/8 - 0s - loss: 1.0560 - accuracy: 0.5417 - val_loss: 0.9047 - val_accuracy: 1.0000 - 36ms/epoch - 4ms/step
#> Epoch 13/100
#> 8/8 - 0s - loss: 1.0418 - accuracy: 0.6167 - val_loss: 0.9318 - val_accuracy: 0.9333 - 46ms/epoch - 6ms/step
#> Epoch 14/100
#> 8/8 - 0s - loss: 1.0286 - accuracy: 0.6667 - val_loss: 0.9616 - val_accuracy: 0.8667 - 46ms/epoch - 6ms/step
#> Epoch 15/100
#> 8/8 - 0s - loss: 1.0150 - accuracy: 0.7583 - val_loss: 0.9884 - val_accuracy: 0.7667 - 43ms/epoch - 5ms/step
#> Epoch 16/100
#> 8/8 - 0s - loss: 1.0017 - accuracy: 0.8083 - val_loss: 1.0147 - val_accuracy: 0.7000 - 51ms/epoch - 6ms/step
#> Epoch 17/100
#> 8/8 - 0s - loss: 0.9893 - accuracy: 0.8417 - val_loss: 1.0418 - val_accuracy: 0.6667 - 47ms/epoch - 6ms/step
#> Epoch 18/100
#> 8/8 - 0s - loss: 0.9785 - accuracy: 0.8583 - val_loss: 1.0720 - val_accuracy: 0.5000 - 49ms/epoch - 6ms/step
#> Epoch 19/100
#> 8/8 - 0s - loss: 0.9656 - accuracy: 0.8333 - val_loss: 1.0908 - val_accuracy: 0.3333 - 91ms/epoch - 11ms/step
#> Epoch 20/100
#> 8/8 - 0s - loss: 0.9553 - accuracy: 0.8500 - val_loss: 1.1139 - val_accuracy: 0.2333 - 42ms/epoch - 5ms/step
#> Epoch 21/100
#> 8/8 - 0s - loss: 0.9448 - accuracy: 0.8417 - val_loss: 1.1361 - val_accuracy: 0.1333 - 43ms/epoch - 5ms/step
#> Epoch 22/100
#> 8/8 - 0s - loss: 0.9345 - accuracy: 0.8417 - val_loss: 1.1550 - val_accuracy: 0.1000 - 44ms/epoch - 5ms/step
#> Epoch 23/100
#> 8/8 - 0s - loss: 0.9251 - accuracy: 0.8250 - val_loss: 1.1751 - val_accuracy: 0.0667 - 38ms/epoch - 5ms/step
#> Epoch 24/100
#> 8/8 - 0s - loss: 0.9154 - accuracy: 0.8333 - val_loss: 1.1901 - val_accuracy: 0.0667 - 37ms/epoch - 5ms/step
#> Epoch 25/100
#> 8/8 - 0s - loss: 0.9063 - accuracy: 0.8333 - val_loss: 1.2023 - val_accuracy: 0.0667 - 39ms/epoch - 5ms/step
#> Epoch 26/100
#> 8/8 - 0s - loss: 0.8982 - accuracy: 0.8250 - val_loss: 1.2200 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 27/100
#> 8/8 - 0s - loss: 0.8892 - accuracy: 0.8250 - val_loss: 1.2326 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 28/100
#> 8/8 - 0s - loss: 0.8808 - accuracy: 0.8333 - val_loss: 1.2419 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 29/100
#> 8/8 - 0s - loss: 0.8729 - accuracy: 0.8333 - val_loss: 1.2523 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 30/100
#> 8/8 - 0s - loss: 0.8654 - accuracy: 0.8333 - val_loss: 1.2609 - val_accuracy: 0.0000e+00 - 41ms/epoch - 5ms/step
#> Epoch 31/100
#> 8/8 - 0s - loss: 0.8581 - accuracy: 0.8333 - val_loss: 1.2715 - val_accuracy: 0.0000e+00 - 40ms/epoch - 5ms/step
#> Epoch 32/100
#> 8/8 - 0s - loss: 0.8506 - accuracy: 0.8333 - val_loss: 1.2738 - val_accuracy: 0.0000e+00 - 48ms/epoch - 6ms/step
#> Epoch 33/100
#> 8/8 - 0s - loss: 0.8434 - accuracy: 0.8333 - val_loss: 1.2829 - val_accuracy: 0.0000e+00 - 41ms/epoch - 5ms/step
#> Epoch 34/100
#> 8/8 - 0s - loss: 0.8365 - accuracy: 0.8333 - val_loss: 1.2911 - val_accuracy: 0.0000e+00 - 39ms/epoch - 5ms/step
#> Epoch 35/100
#> 8/8 - 0s - loss: 0.8297 - accuracy: 0.8333 - val_loss: 1.2948 - val_accuracy: 0.0000e+00 - 87ms/epoch - 11ms/step
#> Epoch 36/100
#> 8/8 - 0s - loss: 0.8232 - accuracy: 0.8333 - val_loss: 1.3012 - val_accuracy: 0.0000e+00 - 50ms/epoch - 6ms/step
#> Epoch 37/100
#> 8/8 - 0s - loss: 0.8165 - accuracy: 0.8333 - val_loss: 1.3010 - val_accuracy: 0.0000e+00 - 53ms/epoch - 7ms/step
#> Epoch 38/100
#> 8/8 - 0s - loss: 0.8102 - accuracy: 0.8333 - val_loss: 1.3019 - val_accuracy: 0.0000e+00 - 44ms/epoch - 5ms/step
#> Epoch 39/100
#> 8/8 - 0s - loss: 0.8039 - accuracy: 0.8333 - val_loss: 1.3054 - val_accuracy: 0.0000e+00 - 42ms/epoch - 5ms/step
#> Epoch 40/100
#> 8/8 - 0s - loss: 0.7979 - accuracy: 0.8333 - val_loss: 1.3026 - val_accuracy: 0.0000e+00 - 47ms/epoch - 6ms/step
#> Epoch 41/100
#> 8/8 - 0s - loss: 0.7918 - accuracy: 0.8333 - val_loss: 1.3017 - val_accuracy: 0.0000e+00 - 52ms/epoch - 6ms/step
#> Epoch 42/100
#> 8/8 - 0s - loss: 0.7859 - accuracy: 0.8333 - val_loss: 1.3025 - val_accuracy: 0.0000e+00 - 36ms/epoch - 4ms/step
#> Epoch 43/100
#> 8/8 - 0s - loss: 0.7801 - accuracy: 0.8333 - val_loss: 1.2989 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 44/100
#> 8/8 - 0s - loss: 0.7746 - accuracy: 0.8333 - val_loss: 1.3047 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 45/100
#> 8/8 - 0s - loss: 0.7688 - accuracy: 0.8333 - val_loss: 1.3036 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 46/100
#> 8/8 - 0s - loss: 0.7632 - accuracy: 0.8333 - val_loss: 1.3005 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 47/100
#> 8/8 - 0s - loss: 0.7579 - accuracy: 0.8333 - val_loss: 1.3013 - val_accuracy: 0.0000e+00 - 33ms/epoch - 4ms/step
#> Epoch 48/100
#> 8/8 - 0s - loss: 0.7528 - accuracy: 0.8333 - val_loss: 1.2915 - val_accuracy: 0.0000e+00 - 32ms/epoch - 4ms/step
#> Epoch 49/100
#> 8/8 - 0s - loss: 0.7472 - accuracy: 0.8333 - val_loss: 1.2899 - val_accuracy: 0.0000e+00 - 32ms/epoch - 4ms/step
#> Epoch 50/100
#> 8/8 - 0s - loss: 0.7421 - accuracy: 0.8333 - val_loss: 1.2866 - val_accuracy: 0.0000e+00 - 30ms/epoch - 4ms/step
#> Epoch 51/100
#> 8/8 - 0s - loss: 0.7371 - accuracy: 0.8333 - val_loss: 1.2873 - val_accuracy: 0.0000e+00 - 32ms/epoch - 4ms/step
#> Epoch 52/100
#> 8/8 - 0s - loss: 0.7322 - accuracy: 0.8333 - val_loss: 1.2802 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 53/100
#> 8/8 - 0s - loss: 0.7273 - accuracy: 0.8333 - val_loss: 1.2806 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 54/100
#> 8/8 - 0s - loss: 0.7225 - accuracy: 0.8333 - val_loss: 1.2814 - val_accuracy: 0.0000e+00 - 38ms/epoch - 5ms/step
#> Epoch 55/100
#> 8/8 - 0s - loss: 0.7179 - accuracy: 0.8333 - val_loss: 1.2858 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 56/100
#> 8/8 - 0s - loss: 0.7134 - accuracy: 0.8333 - val_loss: 1.2793 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 57/100
#> 8/8 - 0s - loss: 0.7090 - accuracy: 0.8333 - val_loss: 1.2748 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 58/100
#> 8/8 - 0s - loss: 0.7045 - accuracy: 0.8333 - val_loss: 1.2692 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 59/100
#> 8/8 - 0s - loss: 0.7003 - accuracy: 0.8333 - val_loss: 1.2657 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 60/100
#> 8/8 - 0s - loss: 0.6962 - accuracy: 0.8333 - val_loss: 1.2570 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 61/100
#> 8/8 - 0s - loss: 0.6920 - accuracy: 0.8333 - val_loss: 1.2458 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 62/100
#> 8/8 - 0s - loss: 0.6877 - accuracy: 0.8333 - val_loss: 1.2423 - val_accuracy: 0.0000e+00 - 32ms/epoch - 4ms/step
#> Epoch 63/100
#> 8/8 - 0s - loss: 0.6840 - accuracy: 0.8333 - val_loss: 1.2358 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 64/100
#> 8/8 - 0s - loss: 0.6798 - accuracy: 0.8333 - val_loss: 1.2373 - val_accuracy: 0.0000e+00 - 33ms/epoch - 4ms/step
#> Epoch 65/100
#> 8/8 - 0s - loss: 0.6761 - accuracy: 0.8333 - val_loss: 1.2359 - val_accuracy: 0.0000e+00 - 32ms/epoch - 4ms/step
#> Epoch 66/100
#> 8/8 - 0s - loss: 0.6725 - accuracy: 0.8333 - val_loss: 1.2430 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 67/100
#> 8/8 - 0s - loss: 0.6690 - accuracy: 0.8333 - val_loss: 1.2358 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 68/100
#> 8/8 - 0s - loss: 0.6652 - accuracy: 0.8333 - val_loss: 1.2337 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 69/100
#> 8/8 - 0s - loss: 0.6618 - accuracy: 0.8333 - val_loss: 1.2264 - val_accuracy: 0.0000e+00 - 36ms/epoch - 5ms/step
#> Epoch 70/100
#> 8/8 - 0s - loss: 0.6584 - accuracy: 0.8333 - val_loss: 1.2254 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 71/100
#> 8/8 - 0s - loss: 0.6552 - accuracy: 0.8333 - val_loss: 1.2223 - val_accuracy: 0.0000e+00 - 38ms/epoch - 5ms/step
#> Epoch 72/100
#> 8/8 - 0s - loss: 0.6519 - accuracy: 0.8333 - val_loss: 1.2094 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 73/100
#> 8/8 - 0s - loss: 0.6487 - accuracy: 0.8333 - val_loss: 1.2020 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 74/100
#> 8/8 - 0s - loss: 0.6460 - accuracy: 0.8333 - val_loss: 1.1919 - val_accuracy: 0.0000e+00 - 36ms/epoch - 5ms/step
#> Epoch 75/100
#> 8/8 - 0s - loss: 0.6428 - accuracy: 0.8333 - val_loss: 1.1898 - val_accuracy: 0.0000e+00 - 50ms/epoch - 6ms/step
#> Epoch 76/100
#> 8/8 - 0s - loss: 0.6398 - accuracy: 0.8333 - val_loss: 1.1825 - val_accuracy: 0.0000e+00 - 39ms/epoch - 5ms/step
#> Epoch 77/100
#> 8/8 - 0s - loss: 0.6368 - accuracy: 0.8333 - val_loss: 1.1822 - val_accuracy: 0.0000e+00 - 38ms/epoch - 5ms/step
#> Epoch 78/100
#> 8/8 - 0s - loss: 0.6341 - accuracy: 0.8333 - val_loss: 1.1809 - val_accuracy: 0.0000e+00 - 36ms/epoch - 5ms/step
#> Epoch 79/100
#> 8/8 - 0s - loss: 0.6314 - accuracy: 0.8333 - val_loss: 1.1760 - val_accuracy: 0.0000e+00 - 37ms/epoch - 5ms/step
#> Epoch 80/100
#> 8/8 - 0s - loss: 0.6288 - accuracy: 0.8333 - val_loss: 1.1785 - val_accuracy: 0.0000e+00 - 36ms/epoch - 5ms/step
#> Epoch 81/100
#> 8/8 - 0s - loss: 0.6262 - accuracy: 0.8333 - val_loss: 1.1850 - val_accuracy: 0.0000e+00 - 43ms/epoch - 5ms/step
#> Epoch 82/100
#> 8/8 - 0s - loss: 0.6237 - accuracy: 0.8333 - val_loss: 1.1831 - val_accuracy: 0.0000e+00 - 39ms/epoch - 5ms/step
#> Epoch 83/100
#> 8/8 - 0s - loss: 0.6213 - accuracy: 0.8333 - val_loss: 1.1809 - val_accuracy: 0.0000e+00 - 40ms/epoch - 5ms/step
#> Epoch 84/100
#> 8/8 - 0s - loss: 0.6190 - accuracy: 0.8333 - val_loss: 1.1886 - val_accuracy: 0.0000e+00 - 39ms/epoch - 5ms/step
#> Epoch 85/100
#> 8/8 - 0s - loss: 0.6166 - accuracy: 0.8333 - val_loss: 1.1798 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 86/100
#> 8/8 - 0s - loss: 0.6141 - accuracy: 0.8333 - val_loss: 1.1826 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 87/100
#> 8/8 - 0s - loss: 0.6120 - accuracy: 0.8333 - val_loss: 1.1784 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 88/100
#> 8/8 - 0s - loss: 0.6099 - accuracy: 0.8333 - val_loss: 1.1697 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 89/100
#> 8/8 - 0s - loss: 0.6076 - accuracy: 0.8333 - val_loss: 1.1688 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step
#> Epoch 90/100
#> 8/8 - 0s - loss: 0.6056 - accuracy: 0.8333 - val_loss: 1.1706 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 91/100
#> 8/8 - 0s - loss: 0.6036 - accuracy: 0.8333 - val_loss: 1.1741 - val_accuracy: 0.0000e+00 - 36ms/epoch - 4ms/step
#> Epoch 92/100
#> 8/8 - 0s - loss: 0.6015 - accuracy: 0.8333 - val_loss: 1.1681 - val_accuracy: 0.0000e+00 - 35ms/epoch - 4ms/step
#> Epoch 93/100
#> 8/8 - 0s - loss: 0.5997 - accuracy: 0.8333 - val_loss: 1.1632 - val_accuracy: 0.0000e+00 - 36ms/epoch - 4ms/step
#> Epoch 94/100
#> 8/8 - 0s - loss: 0.5977 - accuracy: 0.8333 - val_loss: 1.1643 - val_accuracy: 0.0000e+00 - 36ms/epoch - 5ms/step
#> Epoch 95/100
#> 8/8 - 0s - loss: 0.5960 - accuracy: 0.8333 - val_loss: 1.1651 - val_accuracy: 0.0000e+00 - 44ms/epoch - 6ms/step
#> Epoch 96/100
#> 8/8 - 0s - loss: 0.5942 - accuracy: 0.8333 - val_loss: 1.1544 - val_accuracy: 0.0000e+00 - 50ms/epoch - 6ms/step
#> Epoch 97/100
#> 8/8 - 0s - loss: 0.5923 - accuracy: 0.8333 - val_loss: 1.1473 - val_accuracy: 0.0000e+00 - 45ms/epoch - 6ms/step
#> Epoch 98/100
#> 8/8 - 0s - loss: 0.5909 - accuracy: 0.8333 - val_loss: 1.1571 - val_accuracy: 0.0000e+00 - 45ms/epoch - 6ms/step
#> Epoch 99/100
#> 8/8 - 0s - loss: 0.5889 - accuracy: 0.8333 - val_loss: 1.1479 - val_accuracy: 0.0000e+00 - 38ms/epoch - 5ms/step
#> Epoch 100/100
#> 8/8 - 0s - loss: 0.5873 - accuracy: 0.8333 - val_loss: 1.1397 - val_accuracy: 0.0000e+00 - 34ms/epoch - 4ms/step


#Plot the training and validation accuracy:
plot(history)

Note the l2_regularizer applied in the code, which is a popular technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function. In a advanced course of neural networks, you may also encounter advanced regulisation techniques including dropout, batch normalisation, data augmentation, early stopping, and adversarial training to prevent overfitting and improve generalisation performance.

Hyperparameters

The terms such as epochs, batch_size and validation_split as in the above code are called hyperparameters in machine learning. One of the key challenges in neural networks is the tuning of hyperparameters, such as the number of layers and nodes in each layer, the learning rate, and the activation function. Finding the optimal values for these hyperparameters can be time-consuming and computationally expensive.

Despite these challenges, neural networks have shown to be highly effective in a wide range of applications, such as image and speech recognition, natural language processing, and predictive modelling. In fact, deep learning, which refers to the use of neural networks with many layers, has revolutionized many fields, including computer vision and natural language processing.

Deep learning

Deep learning is a subfield of machine learning that uses artificial neural networks with multiple layers to learn hierarchical representations of data. Neural networks, on the other hand, are a type of algorithm that are inspired by the structure and function of biological neurons, and can be used for a variety of tasks, including classification, regression, and clustering.

The main difference between deep learning and neural networks is that deep learning refers specifically to neural networks with many layers, typically more than three or four, while neural networks can have any number of layers. Deep learning is often used for tasks such as image recognition, natural language processing, and speech recognition, where the data is complex and high-dimensional, and hierarchical features are important for accurate modelling.

Another difference between deep learning and neural networks is that deep learning often requires a lot of computational resources, especially for training large models with millions of parameters, while neural networks can be trained on more modest hardware. Additionally, deep learning often involves more complex optimisation techniques, such as stochastic gradient descent with momentum and adaptive learning rate methods, to train the deep neural network effectively.

In summary, while neural networks and deep learning are related concepts, deep learning specifically refers to the use of neural networks with many layers to learn hierarchical representations of data, often requiring more computational resources and complex optimisation techniques.

Challenge

There are several popular packages in R used for neural network and deep learning, each with its own strengths and weaknesses. Here are some of the most commonly used packages:

Package Description Advantages Limitations
neuralnet A basic package for training feedforward neural networks with one hidden layer Easy to use, supports various activation functions and training algorithms Limited to a single hidden layer
nnet A package for training feedforward neural networks with a single hidden layer Easy to use, supports various activation functions and training algorithms Limited to a single hidden layer
caret A package that provides a unified interface to multiple machine learning algorithms, including neural networks Supports many different types of neural networks and training algorithms Can be slow and memory-intensive for large datasets
keras A popular package for building and training deep learning models using the Keras API Supports a wide range of neural network architectures and layers, with GPU acceleration for faster training Requires installation of TensorFlow or other backends
tensorflow A package that provides an R interface to the TensorFlow library for building and training deep learning models Provides a wide range of tools and features for working with neural networks and deep learning Can be more difficult to use than some other packages
mxnet A package for building and training deep learning models using the MXNet library Supports a wide range of neural network architectures and layers, with GPU acceleration for faster training Can be more difficult to use than some other packages

Choose one package and try to use it to predict furniture price as detailed here.