class: center, middle, inverse, title-slide # A gentle introduction to deep learning in R using Keras ### Ladislas Nalborczyk ### Aix Marseille Univ, CNRS, LPC, LNC, Marseille, France ### 21.05.2021
@lnalborczyk
Slides available at
tinyurl.com/vendrediquanti
--- # Overview 1. Theoretical background + What is deep learning? + Deep learning recipes (e.g., backpropagation, overfitting) 2. Practical part / tutorial + Worked example #1: Fashion MNIST classification using a fully connected network + Worked example #2: Surface EMG signals classification using a 1d CNN --- class: inverse, center, middle <iframe width="100%" height="100%" src="https://www.youtube.com/embed/cQ54GDm1eL0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- class: center, middle background-image: url(figures/reface.jpeg) background-position: center background-size: cover --- # What is deep learning? AI includes symbolic expression, logic rules, as well as handcrafted nested if-else statements. Machine learning includes supervised and unsupervised learning models, generalised linear models, tree-based methods, SVMs, clustering methods, etc. Deep learning is the focus of this talk and mostly (but not only) covers deep artificial neural networks (figure from [Sebastian Raschka](https://github.com/rasbt/stat453-deep-learning-ss20)). <img src="figures/deep1.png" width="50%" style="display: block; margin: auto;" /> --- # Why deep learning? Engineering features by hand can be long and tedious... Can we learn the underlying features (at multiple levels of abstraction) directly from the data? <img src="figures/features.png" width="75%" style="display: block; margin: auto;" /> --- # What's deep in deep learning? .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation [...] .tr[ — LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. ]] --- # Why now and why so succesfull? <img src="figures/history.jpg" width="75%" style="display: block; margin: auto;" /> Figure from http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html. --- # Why now and why so succesfull? - Big **data**: Large (high-quality and labelled) datasets, easier collection and storage - Big **hardware**: Graphics Processing Units (GPUs), massive parallelisation - Big **software**: Improved techniques (e.g., activation functions, regularisation), new models, new toolboxes <img src="figures/softwares.png" width="50%" style="display: block; margin: auto;" /> --- # Biological motivation In essence, each neuron takes information from other neurons, processes them, and then produces an output. One could imagine that certain neurons output information based on raw sensory inputs, other neurons build higher representations on that, and so on until one gets outputs that are significant at a higher level. Figure taken from this [post](https://rstudio-pubs-static.s3.amazonaws.com/146706_0754aef7cdab424ebe0ac4e0e5aa362e.html). <img src="figures/neuron.png" width="75%" style="display: block; margin: auto;" /> --- # The Perceptron: forward propagation Figure taken from from http://introtodeeplearning.com. <img src="figures/perceptron.png" width="75%" style="display: block; margin: auto;" /> --- # Common activation functions <img src="figures/activation.png" width="100%" style="display: block; margin: auto;" /> --- # The importance of (non-linear) activation functions ```r library(tidyverse) # for data wrangling & visualisation n <- 1e3 df <- data.frame(x1 = runif(n), x2 = runif(n) ) df$y <- factor(ifelse(df$x1 - df$x2 > 0, -1, 1), levels = c(-1, 1) ) df %>% ggplot(aes(x = x1, y = x2, color = y) ) + geom_point(show.legend = FALSE) + theme_bw(base_size = 12) + labs(x = "Variable 1", y = "Variable 2") ``` <img src="vendredi_quantis_files/figure-html/linear-1.svg" width="50%" style="display: block; margin: auto;" /> --- # The importance of (non-linear) activation functions ```r n <- 1e3 df <- data.frame(x1 = runif(n), x2 = runif(n) ) df$y <- factor(ifelse(0.3 < sqrt(df$x1^2 + df$x2^2) & sqrt(df$x1^2 + df$x2^2) < 0.8, -1, 1), levels = c(-1, 1) ) df %>% ggplot(aes(x = x1, y = x2, color = y) ) + geom_point(show.legend = FALSE) + theme_bw(base_size = 12) + labs(x = "Variable 1", y = "Variable 2") ``` <img src="vendredi_quantis_files/figure-html/nonlinear-1.svg" width="50%" style="display: block; margin: auto;" /> --- # Going deeper <img src="figures/deep.png" width="75%" style="display: block; margin: auto;" /> --- # Universal Approximation Theorem Deep neural networks work so well because they are **universal function approximators**. Specifically, we know that for any nonconstant, bounded, and monotonically-increasing continuous function, there exists a feedforward network with a linear output layer and at least one hidden layer with any "squashing" activation function (such as logistic sigmoid) that can approximate this function with any desired nonzero error. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366. .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ A feedforward network with a single layer is sufficient to represent any function, but the layer may be infeasibly large and may fail to learn and generalize correctly. .tr[ — Goodfellow, I., Bengio, Y., & Courville A. (2016). Deep learning. MIT Press. ]] --- class: middle, center # How does it learn? --- # Building an intuition Example and figures taken from https://e2eml.school/how_backpropagation_works.html. We want to take the perfect shower, but our shower head is finicky... We have two valves that can be used to adjust the water flow rate: the shower handle and the main valve for the house. We are re going to use backpropagation to get them adjusted just right. <img src="figures/system.png" width="50%" style="display: block; margin: auto;" /> --- # Sensitivity What's the change in water flow rate when we change either one of these valves? In other words, how *sensitive* is the water flow rate to either one of these valve settings? We can measure how a one-unit change in the valve settings affect the flow rate (e.g., in units of cubic feet per minute). For instance, if we change the shower handle position from 4 to 8 and notice that the shower flow rate changes from 3 to 5, the sensitivity of shower flow rate to shower handle position is `\((5-3) / (8-4) = 2 / 4 = 0.5\)`. <img src="figures/handle.png" width="50%" style="display: block; margin: auto;" /> --- # Sensitivity (with math) Sensitivity can be defined as the change in one thing per a one unit change in another thing. For instance, change in shower flow rate per one unit change in the shower handle position. To make things simple, we will call the shower flow rate `\(y\)` and the shower handle position `\(h\)`. We call the flow rate in the house `\(x\)` and the position of the main valve `\(m\)`. <img src="figures/sensitivity.png" width="50%" style="display: block; margin: auto;" /> --- # Working backward Last slide, we have defined the shower flow rate as `\(y\)` and the shower handle position `\(h\)`. We called the flow rate in the house `\(x\)` and the position of the main valve `\(m\)`. The sensitivity of the shower flow rate to the shower handle position was `\(\partial{y} / \partial{h}\)`. What's the sensitivity for the main house valve? -- By dividing our change in `\(x\)` by our change in `\(m\)`, we can calculate the sensitivity of house flow rate to main valve setting, `\(\partial{x} / \partial{h}\)`. -- The output flow rate of the showerhead `\(y\)` depends both on the shower handle setting `\(h\)` and the house flow rate flowing into the shower handle `\(x\)`. We can put a number on this sensitivity too. Adjusting the main valve `\(m\)` changes the house flow rate `\(x\)` which indirectly affects the shower flow rate `\(y\)`. By measuring the change in the house flow rate `\(x\)` and the corresponding change in the flow rate through the showerhead `\(y\)`, we can find the sensitivity of the shower flow rate to increases in the house flow rate, `\(\partial{y} / \partial{x}\)`. --- # Chain rule <img src="figures/system.png" width="33%" style="display: block; margin: auto;" /> We have a few different sensitivities, `\(\partial{y} / \partial{h}\)`, `\(\partial{y} / \partial{x}\)`, and `\(\partial{x} / \partial{m}\)`. But we do not know the sensitivity of the shower flow rate with respect to the main valve, `\(\partial{y} / \partial{m}\)`... but imagine we had measured `\(\partial{x} / \partial{m}\)` to be two and `\(\partial{y} / \partial{x}\)` to be `\(1/4\)`. We can multiply the two together to get the net result: `\(2 \times 1 / 4 = 1 / 2\)`. $$ \partial{y} / \partial{m} = \partial{y} / \partial{x} \times \partial{x} / \partial{m} $$ -- In other words, we can chain together sensitivities by multiplying them. This is known as the **chain rule**, which states that we can (for instance) compute partial derivatives for each layer of a network as `\(\partial \text{layer}_{2} \times \partial \text{layer}_{1} \Rightarrow \partial f(g(x)) \times \partial g(x)\)`. --- # How far from ideal is the shower? Let's say that our ideal shower flow rate is a special value of `\(y\)`, which we call `\(y'\)`. We can calculate our deviation, how far away from this ideal value we are, by taking `\(y - y'\)`. To express our unhappiness with the current state of the water flow, we can use how far away it is from the ideal: the absolute value of `\(y - y'\)` or `\(|y - y'|\)`. We'll call this `\(E\)`, our error, and we would like it to be zero. Our goal will be to adjust our valves `\(m\)` and `\(h\)` to make our shower flow rate perfect, drive `\(y\)` to be `\(y'\)`, and make `\(E\)` go to zero. <img src="figures/error.png" width="50%" style="display: block; margin: auto;" /> --- # How far from ideal is the shower? We can compute the sensitivity of `\(E\)` to changes in our shower flow rate. The derivative of an absolute value is straightforward: `\(\partial{E} / \partial{y} = 1\)` if `\(y\)` is greater than `\(y'\)` and it's `\(-1\)` if it's less than `\(y'\)`. It's not actually defined at `\(y = y'\)`, but we can just declare it to be zero. Now we can chain this with our other sensitivities to find the sensitivity of the error to our two valve positions. <img src="figures/abs_value.png" width="50%" style="display: block; margin: auto;" /> --- # How much should each valve be adjusted? Now we have one thing we want to change, the error, and two ways to change it. How do we go about it? This is where backpropagation comes in. The secret is to weight the adjustment (to each valve) by the sensitivity... The safest way to handle an uncertain, nonlinear, dynamic situation like this is to take tiny steps. Instead of trying to move the whole distance all at once, we move 1/100, or 1/1000, or 1/10000 of the way (the specific distance is governed by the learning rate `\(\eta\)`). <img src="figures/gradient_descent.png" width="50%" style="display: block; margin: auto;" /> --- # Making the first adjustment Now we finally know everything we need to adjust our valves and get our shower set up. Each valve adjustment will be proportional to the sensitivity of the error to that valve, and in the opposite direction (we want `\(E\)` to go down, not up). We multiply that by our learning rate, `\(\eta\)`. So for our first iteration, our adjustment to the shower handle, `\(\Delta \text{h}1\)`, is `\(−\eta \times \partial{E} / \partial{y} \times \partial{y} / \partial{h}\)`. Similarly, the change to the main valve, `\(\Delta \text{m}1\)`, is `\(−\eta \times \partial{E} / \partial{y} \times \partial{y} / \partial{x} \times \partial{x} / \partial{m}\)`. <img src="figures/update_rule.png" width="50%" style="display: block; margin: auto;" /> --- # Backpropagation in the real world We have seen how to apply backpropagation to two nodes (weights). In real settings, our network can have thousands or millions of parameters to adjust. However, the principles are the same. - Chain sensitivities back through the network - Make a small update - Observe the effects - Update the sensitivities throughout the network - And repeat --- # Backpropagation in the real world Figure taken from https://baptiste-monpezat.github.io/blog/stochastic-gradient-descent-for-machine-learning-clearly-explained. In this animation, the blue line corresponds to stochastic gradient descent and the red one is a basic gradient descent algorithm. <img src="figures/sgd.gif" width="75%" style="display: block; margin: auto;" /> --- # Fighting overfitting More complex models (i.e., models with more parameters) will always fit the *training* data better. How can we ensure that we learn enough (but not too much) from these training data, so that the model is able to generalise well to unseen data? <img src="figures/overfitting.png" width="75%" style="display: block; margin: auto;" /> --- # Fighting overfitting Using **regularisation techniques** (i.e., modification of the learning algorithm to reduce testing error but not training error) such as **dropout**: during training, we randomly set some activations (weights) to zero. This effectively forces the network not to rely on any given node. <img src="figures/dropout.png" width="75%" style="display: block; margin: auto;" /> --- # Fighting overfitting We can also use **early stopping**. Basically, stopping the training before it overfits. <img src="figures/early_stopping.png" width="75%" style="display: block; margin: auto;" /> --- # Aparté: Tensors Tensor is the general name for multidimensional array data. A 1d-tensor is simply a vector, a 2d-tensor is a matrix, a 3d-tensor is a cube. We can image a 4d-tensor as a vector of cubes, a 5d-tensor as a matrix of cubes, and a 6d-tensor as a cube of cubes. <img src="figures/tensors.png" width="50%" style="display: block; margin: auto;" /> --- class: middle, center # Practical part --- class: middle, center # Worked example #1: Fashion MNIST classification using a fully connected network --- # Installing and loading packages ```r # installing tensorflow install.packages("tensorflow") library(tensorflow) install_tensorflow() # installing keras install.packages("keras") library(keras) install_keras() ``` --- # MNIST - National Institute of Standards and Technology (NIST) database - MNIST (Modified NIST) - 60.000 training images and 10.000 testing images - normalised to fit into a 28-by-28 pixel bounding box <img src="figures/mnist.png" width="40%" style="display: block; margin: auto;" /> --- # Fashion MNIST data We will use the Fashion MNIST dataset, which contains 70.000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels). <img src="figures/fashion.png" width="40%" style="display: block; margin: auto;" /> --- # Fashion MNIST data ```r # loading the keras inbuilt fashion MNIST dataset fashion_mnist <- dataset_fashion_mnist() # retrieving train and test data train_images <- fashion_mnist$train$x train_labels <- fashion_mnist$train$y test_images <- fashion_mnist$test$x test_labels <- fashion_mnist$test$y ``` -- At this point we have four arrays: The `train_images` and `train_labels` arrays are the training data (i.e., the data the model uses to learn). The model is tested against the test set: the `test_images` and `test_labels` arrays. The images each are 28 x 28 arrays, with pixel values ranging between 0 and 255. The labels are arrays of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents: ```r class_names <- c( "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot" ) ``` --- # Exploring the data ```r # there are 60.000 images in the training set, with each image represented as 28 x 28 pixels dim(train_images) ``` ``` ## [1] 60000 28 28 ``` -- ```r # there are 60.000 labels dim(train_labels) ``` ``` ## [1] 60000 ``` -- ```r # each label is an integer between 0 and 9 train_labels[1:20] ``` ``` ## [1] 9 0 0 3 0 2 7 2 5 5 0 9 5 5 7 9 1 0 6 4 ``` --- # Exploring the data ```r # we rescale the image pixel values between 0 and 1 train_images <- train_images / 255 test_images <- test_images / 255 # plotting an item as an example img <- train_images[1, , ] img <- t(apply(img, 2, rev) ) image(x = 1:28, y = 1:28, z = img, col = gray((0:255) / 255), xaxt = "n", yaxt = "n") ``` <img src="vendredi_quantis_files/figure-html/unnamed-chunk-29-1.svg" width="33%" style="display: block; margin: auto;" /> --- # Defining a first model ```r model <- keras_model_sequential() %>% # creating a model in sequential mode layer_flatten(input_shape = c(28, 28) ) %>% # 2d-array to 1d-array layer_dense(units = 128, activation = "relu") %>% # densely-connected layer layer_dropout(rate = 0.3) %>% # using dropout to reduce overfitting layer_dense(units = 10, activation = "softmax") # predicting the class probability ``` The first layer in this network, `layer_flatten()`, transforms the format of the images from a 2d-array (of 28 by 28 pixels), to a 1d-array of `\(28 \times 28 = 784\)` pixels. Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn, it only reformats the data. -- After the pixels are flattened, the network consists of a sequence of two dense layers. These are **densely-connected**, or **fully-connected**, neural layers. The first dense layer has 128 nodes (or neurons). The second (and last) layer is a 10-node softmax layer, which returns an array of 10 probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the 10 digit classes. --- # Defining a first model ```r summary(model) ``` ``` ## Model: "sequential" ## ________________________________________________________________________________ ## Layer (type) Output Shape Param # ## ================================================================================ ## flatten (Flatten) (None, 784) 0 ## ________________________________________________________________________________ ## dense_1 (Dense) (None, 128) 100480 ## ________________________________________________________________________________ ## dropout (Dropout) (None, 128) 0 ## ________________________________________________________________________________ ## dense (Dense) (None, 10) 1290 ## ================================================================================ ## Total params: 101,770 ## Trainable params: 101,770 ## Non-trainable params: 0 ## ________________________________________________________________________________ ``` --- # Compiling the model The **loss function** measures how accurate the model is during training. We want to minimise this function to "steer" the model in the right direction. The **optimiser** specifies *how* the model is updated based on the data it sees and its loss function. ```r model %>% compile( loss = "sparse_categorical_crossentropy", # loss function to be minimised optimizer = "adam", # how the model is updated metrics = "accuracy" # used to monitor the training ) ``` --- # Training the model We train the model using the training data, for 10 epochs and by keeping 20% of the training set for validation. ```r history <- model %>% fit( x = train_images, y = train_labels, epochs = 10, validation_split = 0.2, verbose = 2 ) ``` --- # Plotting the training history ```r plot(history) ``` <img src="vendredi_quantis_files/figure-html/predict-model1-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Evaluating accuracy We then evaluate the accuracy of the predictions on the testing data. ```r model %>% evaluate(test_images, test_labels) ``` ``` ## loss accuracy ## 0.3535064 0.8792000 ``` --- # Making predictions With the model trained, we can use it to make predictions about some images from the testing dataset. ```r # predicts the softmax prob predictions <- model %>% predict(test_images) # array of then probs (one for each class) predictions[1, ] ``` ``` ## [1] 2.903383e-08 6.454761e-10 1.274260e-11 2.984406e-09 1.510436e-11 ## [6] 1.491865e-03 4.410875e-10 9.791809e-03 8.133487e-08 9.887162e-01 ``` ```r # which class has the maximum prob which.max(predictions[1, ]) ``` ``` ## [1] 10 ``` ```r # directly predicts the class class_pred <- model %>% predict_classes(test_images) class_pred[1:20] ``` ``` ## [1] 9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 2 8 0 ``` --- # Making predictions Let's plot several test images with their predicted class. Correct prediction labels are green and incorrect prediction labels are red. <img src="vendredi_quantis_files/figure-html/unnamed-chunk-32-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Making predictions Finally, we use the trained model to make a prediction about a single image. ```r # picks an image from the test dataset (pay attention to the batch dimension) img <- test_images[1, , , drop = FALSE] dim(img) ``` ``` ## [1] 1 28 28 ``` ```r # directly retrieves the predicted class class_pred <- model %>% predict_classes(img) class_pred ``` ``` ## [1] 9 ``` ```r # retrieves the corresponding label class_names[class_pred + 1] ``` ``` ## [1] "Ankle boot" ``` --- class: middle, center # Worked example #2: Surface EMG signals classification using a 1d CNN --- # Importing data Importing the data from [Nalborczyk et al. (2020, PLOS ONE)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0233282). ```r # loading input features x_reshaped <- readRDS("data/x.rds") # (6 reps x 20 words x 22 participants) x 1 sec x 2 muscles dim(x_reshaped) ``` ``` ## [1] 2640 1000 2 ``` ```r # loading labels y <- readRDS("data/y.rds") # 6 reps x 20 words x 22 participants length(y) ``` ``` ## [1] 2640 ``` --- # Visualising the EMG data For each trial (1 sec), we have the EMG amplitude recorded over two facial muscles: the orbicularis oris inferior (OOI) and the zygomaticus major (ZYG) muscles. <img src="vendredi_quantis_files/figure-html/emg-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Visualising the EMG data <img src="figures/emg.png" width="50%" style="display: block; margin: auto;" /> --- # Reshaping the data ```r # train/test split (80%) b <- 0.8 * nrow(x_reshaped) x_train <- x_reshaped[1:b, , ] x_test <- x_reshaped[(b + 1):nrow(x_reshaped), , ] c(dim(x_train), dim(x_test) ) ``` ``` ## [1] 2112 1000 2 528 1000 2 ``` ```r # dummy encoding of labels num_classes <- n_distinct(y) %>% as.numeric y_categ <- to_categorical(y = y, num_classes = num_classes) # train/test split y_train <- y_categ[1:b, ] y_test <- y_categ[(b + 1):nrow(y_categ), ] ``` --- # Creating the model ```r # input_shape should be [samples, time_steps, features] model <- keras_model_sequential() model %>% layer_conv_1d( filters = 40, kernel_size = 10, strides = 2, padding = "same", activation = "relu", input_shape = c(dim(x_reshaped)[2], dim(x_reshaped)[3]) ) %>% layer_dropout(rate = 0.2) %>% layer_max_pooling_1d(pool_size = 3) %>% layer_conv_1d( filters = 32, kernel_size = 5, strides = 2, padding = "same", activation = "relu" ) %>% layer_dropout(rate = 0.2) %>% layer_max_pooling_1d(pool_size = 3) %>% layer_global_max_pooling_1d() %>% layer_dense(units = 64, activation = "relu") %>% layer_dropout(rate = 0.3) %>% layer_dense(units = num_classes, activation = "softmax") ``` --- # Creating the model ```r summary(model) ``` ``` ## Model: "sequential_1" ## ________________________________________________________________________________ ## Layer (type) Output Shape Param # ## ================================================================================ ## conv1d_1 (Conv1D) (None, 500, 40) 840 ## ________________________________________________________________________________ ## dropout_3 (Dropout) (None, 500, 40) 0 ## ________________________________________________________________________________ ## max_pooling1d_1 (MaxPooling1D) (None, 166, 40) 0 ## ________________________________________________________________________________ ## conv1d (Conv1D) (None, 83, 32) 6432 ## ________________________________________________________________________________ ## dropout_2 (Dropout) (None, 83, 32) 0 ## ________________________________________________________________________________ ## max_pooling1d (MaxPooling1D) (None, 27, 32) 0 ## ________________________________________________________________________________ ## global_max_pooling1d (GlobalMaxPool (None, 32) 0 ## ________________________________________________________________________________ ## dense_3 (Dense) (None, 64) 2112 ## ________________________________________________________________________________ ## dropout_1 (Dropout) (None, 64) 0 ## ________________________________________________________________________________ ## dense_2 (Dense) (None, 2) 130 ## ================================================================================ ## Total params: 9,514 ## Trainable params: 9,514 ## Non-trainable params: 0 ## ________________________________________________________________________________ ``` --- # Fitting the model ```r model %>% compile( loss = "categorical_crossentropy", optimizer = "adam", metrics = c("accuracy") ) ``` ```r history <- model %>% fit( x_train, y_train, epochs = 20, batch_size = 10, validation_split = 0.2, # callbacks = list( # callback_early_stopping(monitor = "val_loss", patience = 10, verbose = 1) # ) ) ``` --- # Plotting the evolution of loss during training ```r plot(history) ``` <img src="vendredi_quantis_files/figure-html/unnamed-chunk-39-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Assessing the fit ```r # evaluating the model's predictions model %>% evaluate(x_test, y_test) ``` ``` ## loss accuracy ## 0.3499671 0.8750000 ``` ```r # making predictions predictions <- model %>% predict_classes(x_test) # confusion matrix table(target = y[(b + 1):nrow(y_categ)], prediction = predictions) ``` ``` ## prediction ## target 0 1 ## 0 242 22 ## 1 44 220 ``` --- # Extra utilities ```r # saving the entire model (weights) save_model_hdf5(model, "models/emg_1d_cnn_model_overt.h5") loaded_model <- load_model_hdf5("models/emg_1d_cnn_model_overt.h5") # saving JSON config json_config <- model_to_json(model) writeLines(json_config, "models/emg_1d_cnn_model_config_overt.json") ``` --- # References and further resources The surface EMG data and some R code: https://github.com/lnalborczyk/surface_emg_cnn Introduction to deep learning (full course): https://introtodeeplearning.com Introduction to deep learning (full course): https://sebastianraschka.com/resources/dl-lectures/ RStudio tutorial: https://tensorflow.rstudio.com/tutorials/beginners/basic-ml/tutorial_basic_classification/ Another RStudio tutorial: https://blog.rstudio.com/2018/09/12/getting-started-with-deep-learning-in-r/ Deep learning in R (book): https://www.amazon.com/Deep-Learning-R-Francois-Chollet/dp/161729554X A great introduction to deep learning in R: https://github.com/rstudio-conf-2020/dl-keras-tf --- # Take-home messages <!-- <link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css"> <link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css"> <link rel = "stylesheet" href = "css/font-awesome.css"/> <link rel = "stylesheet" href = "css/academicons.css"/> --> * **Deep learning**: A class of machine learning algorithms that use multiple layers to progressively extract higher-level features from the raw input (definition taken from the [Wikipedia](https://en.wikipedia.org/wiki/Deep_learning) article). * **The universal approximation theorem**: A network with at least one hidden layer can approximate any continuous function. * **The Keras framework**: User-friendly high-level interface to Tensorflow, available in Python and R. <br> <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [lnalborczyk](https://twitter.com/lnalborczyk) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> [lnalborczyk](https://github.com/lnalborczyk) <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="m 255.9997,7.9999987 c -34.36057,0 -62.21509,27.8545563 -62.21509,62.2151643 0,20.303056 9.87066,38.160947 24.91769,49.517247 0.18814,-20.457899 16.79601,-36.993393 37.29685,-36.993393 20.50082,0 37.11091,16.535494 37.29909,36.993393 15.04533,-11.3563 24.9177,-29.212506 24.9177,-49.517247 C 318.21272,35.854555 290.35915,7.9999987 255.99915,7.9999987 Z M 293.29654,392.2676 c -0.18814,20.4601 -16.79601,36.99338 -37.29684,36.99338 -20.50082,0 -37.10922,-16.53551 -37.29684,-36.99338 -15.04759,11.35627 -24.91769,29.21246 -24.91769,49.51722 0,34.36059 27.85453,62.21518 62.2151,62.21518 34.36056,0 62.21508,-27.85459 62.21508,-62.21518 0,-20.30306 -9.87066,-38.16095 -24.91767,-49.51722 z M 441.78489,193.78484 c -20.30301,0 -38.16309,9.87068 -49.51717,24.91769 20.45786,0.18819 36.99333,16.79605 36.99333,37.29689 0,20.50085 -16.53547,37.11096 -36.9911,37.29916 11.35634,15.04533 29.21249,24.91769 49.51721,24.91769 C 476.14549,318.21327 504,290.35948 504,255.99942 504,221.6394 476.14549,193.78425 441.78489,193.78425 Z M 82.738898,255.99997 c 0,-20.50139 16.535509,-37.11096 36.993392,-37.29689 -11.35632,-15.04756 -29.214164,-24.91769 -49.517197,-24.91769 -34.36057,0 -62.2150945,27.85455 -62.2150945,62.21517 0,34.3606 27.8545245,62.21516 62.2150945,62.21516 20.303033,0 38.160877,-9.87068 49.517197,-24.91773 -20.457883,-0.18818 -36.993391,-16.796 -36.993391,-37.29688 z M 431.3627,80.636814 c -24.29549,-24.295544 -63.68834,-24.295544 -87.9844,0 -14.35704,14.357057 -20.00298,33.963346 -17.39331,52.633806 -0.0824,0.0809 -0.18198,0.13437 -0.26434,0.21491 -14.578,14.57799 -14.578,38.21689 0,52.79488 14.57797,14.57799 38.21681,14.57799 52.79484,0 0.0824,-0.0824 0.13455,-0.18198 0.21732,-0.26434 18.66819,2.60796 38.27445,-3.03799 52.63151,-17.39336 24.29378,-24.29778 24.29378,-63.68837 -0.003,-87.986153 z M 186.2806,378.51178 c 14.57798,-14.57799 14.57798,-38.21461 0,-52.79319 -14.57798,-14.57853 -38.21683,-14.57798 -52.79481,0 -0.0825,0.0824 -0.13448,0.18199 -0.21476,0.26215 -18.67046,-2.60795 -38.276723,3.03572 -52.63376,17.39505 -24.297753,24.29552 -24.297753,63.6884 0,87.98449 24.29551,24.29552 63.68833,24.29552 87.98439,0 14.35702,-14.35703 20.00297,-33.96333 17.39333,-52.63386 0.0848,-0.0786 0.18364,-0.13228 0.26672,-0.21505 z m 0,-245.02583 c -0.0826,-0.0824 -0.18198,-0.13436 -0.26445,-0.21494 2.60795,-18.66823 -3.038,-38.27452 -17.39332,-52.633811 -24.29777,-24.295544 -63.68832,-24.295544 -87.984405,0 -24.297747,24.297781 -24.297747,63.688381 0,87.986151 14.357042,14.35706 33.963315,20.00301 52.631515,17.39336 0.0808,0.0824 0.13447,0.18199 0.21475,0.26434 14.57799,14.57799 38.21684,14.57799 52.79482,0 14.57797,-14.57802 14.57797,-38.21689 0,-52.79488 z m 245.0821,209.89048 c -14.35703,-14.35703 -33.96329,-20.00301 -52.63378,-17.39505 -0.0809,-0.0824 -0.13228,-0.18199 -0.21506,-0.26215 -14.57797,-14.57799 -38.21685,-14.57799 -52.79482,0 -14.57797,14.57799 -14.57797,38.21461 0,52.79316 0.0827,0.0828 0.18198,0.13455 0.26434,0.21505 -2.60796,18.67053 3.03802,38.27683 17.39334,52.63386 24.29552,24.29552 63.68834,24.29552 87.98439,0 24.29775,-24.29552 24.29775,-63.68841 0.003,-87.98451 z" style="stroke-width:0.07717"></path> </g></svg> [https://osf.io/ba8xt](https://osf.io/ba8xt) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> [www.barelysignificant.com](https://www.barelysignificant.com)