Please, read my previous post about Swift & TensorFlow


I took “Hello World!”  in the universe of neural networks as an example, a task for systematization of MNIST images. MNIST dataset includes thousands of images of handwritten numbers, the size of each image is 28×28 pixels. So, we have ten classes that are neatly divided into 60 000 images for educating and 10 000 images for testing. Our task is to create a neural network that is able to classify an image and determine the class it belongs to (out of 10 classes).

Before you can start working with TensorFlowKit, you need to install TensorFlow On Mac OS, you can use the brew package manager:

Assembly for Linux is available here.

Let’s create a Swift project and add a dependency:

Now we should prepare the MNIST dataset.

I have written a Swift package for working with the MNIST dataset that you can find here. This package will download the dataset to a temporary folder, unpack it, and represent it as ready-to-use classes.

For example:

Now let’s create the required operation graph.

The space and subspace of the calculation graph is called scope and can have its own name. We’ll provide two vectors for the network input. The first one contains the images represented as a 784 high-dimension vector (28×28 px). So, each component of the x vector will contain a Float from 0.0-1.0 value that corresponds to the color of the pixel on the image. The second vector will be an encrypted matching class (see below), where the corresponding component 1 matches the class number.  In the following example it’s class 2.

As input parameters will change during the educative process, let’s create a placeholder to refer to them.


That’s how Input looks on the graph:

Graph, input scope.

That is our input layer. Now let’s create weights (connections) between the input and hidden layer.

We will create a variable operation in the graph, because the weights and bases will be customized during the educative process. Let’s initialize them using a tensor filled with nulls. 


Weights variable.

Now let’s create a hidden layer that will perform such primitive operation as (x * W) + bThis operation multiplies vector x (dimension 1×784) by matrix W (dimension 784×10) and adds basis.

In our case the hidden layer is the output layer (the task of the “Hello World!” level), that’s why we need to analyze the output signal and decide the winner.  To do that, we should use the softmax operation.

Hidden layer (x * W) + b

I suggest to take our neural network as a complicated function in order to better understand what I will be talking about hereafter. We input vector x (representing the image) to our function. In the output we get a vector that shows the probability of the input vector belonging to each of the available classes.

Now let’s take a natural logarithm of the received probability for each class and multiply it by the value of the vector of the right class neatly passed in the very beginning (yLabel). This way we will get the error value and use it to “judge” the neural network. The figure below demonstrates two samples. In the first sample, for class 2 the error value is 2.3, and in the second sample, for class 1 the error value is 0.

Softmax example.

What to do next?

If talking mathematical language, we have to minimize the target function. To do that, the gradient descent method can be used. If it may become necessary, I will try to describe this method in another article.

So, we should calculate how to correct each of the weighs (components of the W matrix) and the basis vector b, so that the neural network would make smaller error when receiving similar input data. In the context of math, we should find the partial derivatives of the output node by the values of all intermediate nodes. The symbolic gradients we’ve got allow us to “move” the values of the W and b variables according to the extent it affected the result of the previous calculations.

TensorFlow Magic

The thing is that TensorFlow can perform all (however, not all at the very moment) these complicated calculations automatically by analyzing the graph we created.

After this operation call, TensorFlow will create about fifty more operations.

Calculation partial gradients by TensorFlow.

Now it is enough to add an operation for updating the weights to the value we received earlier using the gradient descent method.

That’s it – the graph is ready!


As I said, TensorFlow separates the model and calculations. That’s why the graph we created is only a model for performing calculations. We can use Session to start the calculation process. Let’s prepare data from the dataset, place it to tensors, and run the session.

It is necessary to run the session several times to let it recalculate the value several times.

The error range is shown after every 100 operations. In the next article, I will tell you how to calculate the accuracy of our network and how to visualize it using the means of TensorFlowKit.

Author: Volodymyr Pavliukevych

Senior Software Engineer, Data Scientist.

Senior Software Engineer, Data Scientist.

No Comments so far.

Leave a Reply