The Flow Inspector is tool which can help you review your TensorFlow Graph in your Swift written program.

The video below shows the default layout of the Flow Inspector debugger and main interaction process.

The Four Parts of Debugging and the Debugging Tools

There are four parts to the debugging workflow:

- File Navigator – where you can select bin and source files.
- Source section to review your code and select certain function to review.
- Graph section to review your graph inside Flow Inspector.
- Console output section to review output and errors in your program.

Flow Inspector Alpha version is available on GitHub.

]]>Official documentation describes the compilation process:

Once the tensor operations are desugared, a transformation we call “partitioning” extracts the graph operations from the program and builds a new SIL function to represent the tensor code. In addition to removing the tensor operations from the host code, new calls are injected that call into our new runtime library to start up TensorFlow, rendezvous to collect any results, and send/receive values between the host and the tensor program as it runs. The bulk of the Graph Program Extraction transformation itself lives in TFPartition.cpp.

Once the tensor function is formed, it has some transformations applied to it, and is eventually emitted to a TensorFlow graph using the code in TFLowerGraph.cpp. After the TensorFlow graph is formed, we serialize it to a protobuf and encode the bits directly into the executable, making it easy to load at program runtime.

Actually the final graph is serialized into protobuf bytes and copied directly into the executable file.

I made a small debug tool, – Flow Inspector which can handle that problem.

You can find package template and readme on my GitHub page.

]]>There are some interesting points:

1) High level APIs will be presented as a separate SwiftPM package under github.com/tensorflow.

High level APIs were added earlier purely to explore the programming model, not to be usable by anyone. Having high level APIs be part of the stdlib module conveys a wrong message for beta testers, and it has been confusing ever since our open source release.

2) Supporting Python code is one of priority:

- Improved Python diagnostics related to member access.
- Improved Python C API functions for binary arithmetic operations.

3) Improved cross-device sends and receives support.

4) Lots of work done around supporting generic @dynamicCallable methods.

5) Deprecated `a.dot(b)`

and `⊗`

to `matmul(a, b)`

.

Google brain team launch a new project ‘Swift for TensorFlow’.

Swift for TensorFlow is a new way to develop machine learning models. It gives you the power of TensorFlow directly integrated into the Swift programming language.

That means next few month I will work on Kraken’s new API. Join community, follow updates.

]]>Online demo of t-SNE visualization you can see here.

Machine learning algorithms have been put to good use in various areas for several years already. Analysis of various political events can become one of such areas. For instance, it can be used for predicting voting results, developing mechanisms for clustering the decisions made, analysis of political actors’ actions. In this article, I will try to describe the result of a research in this area.

Modern machine learning capabilities allow converting and visualizing huge amounts of data. Thereby it became possible to analyze political parties’ activities by converting voting instances that took place during 4 years into a self-organizing space of points that reflects actions of each elected official.

Each politician expressed themselves via 12 000 voting instances. Each voting instance can represent one of five possible actions (the person was absent, skipped the voting, voted approval, voted in negative, abstained).

The task is to convert the results of all voting instances into a point in the 3D Euclidean space that will reflect some considered attitude.

The original data was taken from the official website and converted into intermediate data for a neural network.

Considering the problem definition, it is necessary to represent 12 000 voting instances as a vector of the 2 or 3 dimension. Humans can operate 2- or 3-dimension spaces, and it is quite difficult to imagine more spaces.

Let’s apply autoencoder to decrease the capacity.

The autoencoder is based on two functions:

\(h = e\left(x \right)\) – encoding function;

\(x’ = d(h)\) – decoding function;

The initial vector \(x\) with dimension \(m\) is supplied to the neural network as an input, and the network converts it into the value of the hidden layer \(h\) with dimension \(n\). After that the neural network decoder converts the value of the hidden layer \(h\) into an output vector \(x\) with dimension \(m\), while \(m > n\). That is, in the result the hidden layer \(h\) will be of lesser dimension, while being able to display all the range of the initial data.

Objective cost function is used for exercising the network:

\(L=(x, x’)=(x, d(e(x))\)

In other words, the difference between the values of the input and output layers is minimized. Exercised neural network allows compressing the dimension of the initial data to some dimension \(n\) on the hidden layer \(h\) .

On the figure, you can see one input layer, one hidden layer and one output layer. There can be more such layers in a real-case scenario.

Now we are finished with the theoretical part, let’s do some practice.

The data has been collected from the official site in the JSON format, and encoded into a vector already.

Now there is a dataset with dimension 24000 x 453. Let’s create a neural network using the TensorFlow means:

# Building the encoder def encoder(x): with tf.variable_scope('encoder', reuse=False): with tf.variable_scope('layer_1', reuse=False): w1 = tf.Variable(tf.random_normal([num_input, num_hidden_1]), name="w1") b1 = tf.Variable(tf.random_normal([num_hidden_1]), name="b1") layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, w1), b1)) with tf.variable_scope('layer_2', reuse=False): w2 = tf.Variable(tf.random_normal([num_hidden_1, num_hidden_2]), name="w2") b2 = tf.Variable(tf.random_normal([num_hidden_2]), name="b2") layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, w2), b2)) with tf.variable_scope('layer_3', reuse=False): w2 = tf.Variable(tf.random_normal([num_hidden_2, num_hidden_3]), name="w2") b2 = tf.Variable(tf.random_normal([num_hidden_3]), name="b2") layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, w2), b2)) return layer_3 # Building the decoder def decoder(x): with tf.variable_scope('decoder', reuse=False): with tf.variable_scope('layer_1', reuse=False): w1 = tf.Variable(tf.random_normal([num_hidden_3, num_hidden_2]), name="w1") b1 = tf.Variable(tf.random_normal([num_hidden_2]), name="b1") layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, w1), b1)) with tf.variable_scope('layer_2', reuse=False): w1 = tf.Variable(tf.random_normal([num_hidden_2, num_hidden_1]), name="w1") b1 = tf.Variable(tf.random_normal([num_hidden_1]), name="b1") layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, w1), b1)) with tf.variable_scope('layer_3', reuse=False): w2 = tf.Variable(tf.random_normal([num_hidden_1, num_input]), name="w2") b2 = tf.Variable(tf.random_normal([num_input]), name="2") layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, w2), b2)) return layer_3 # Construct model encoder_op = encoder(X) decoder_op = decoder(encoder_op) # Prediction y_pred = decoder_op # Targets (Labels) are the input data. y_true = X

All project code available on GitHub page.

The network will be exercised by the RMSProb optimizer with learning rate 0.01. In the result, you can see the TensorFlow operation chart:

For extra testing purposes, let’s select the first four vectors and render their values as images on the neural network input and output. This way you can ensure that the values of the input and output layers are “identical” (to a tolerance).

Now let’s gradually pass all input data to the neural network and extract values of the hidden layer. These values are the compressed data in question. Besides, I tried to select different layers and chose the configuration that allowed coming around minimum error. Origin is the diagram of the benchmark exercising.

On this stage, you have 450 vectors with dimension 128. This result is quite good, but it is not good enough to give it away to a human. That’s why let’s go deeper. Let’s use the PCA and t-SNE approaches to lessen the dimension. There are many articles devoted to the principal component analysis method (*PCA*), so I won’t include any descriptions herein, however, I would like to tell you about the t-SNE approach. The initial document, **Visualizing data using t-SNE**, contains a detailed description of the algorithm; I will take reducing two-dimensional space to one-dimensional space as an example.

There is a 2D space and three classes (A, B, and C) located within this space. Let’s try to project the classes to one of the axes.

As you can see, none of the axes is able to give us the broad picture of the initial classes. The classes get all mixed up, and, as a result, lose their initial characteristics. The task is to arrange the elements in the eventual space maintaining the distance ratio they had in the initial space. That is, the elements that were close to each other should remain closer than those located farther.

Let’s convey the initial relation between the datapoints in the initial space as the distance between the points \(x_i\), \(x_j\) in Euclidean space: \(\mathopen|x_i – x_j\mathclose|\) and \(\mathopen| y_i – y_j \mathclose|\) correspondingly for the point in the space in question.

Let’s define conditional probabilities that represent similarities of points in the initial space:

\(p_{ij}=\frac{exp(- \mathopen||x_i – x_j\mathclose|| ^2 /2\sigma^2)}{ \sum_{k \neq l} exp(- \mathopen||x_k – x_l\mathclose|| ^2 /2\sigma^2)}\)

This expression shows how close the point \(x_j\) is to \(x_i\) providing that you define the distance to the nearest datapoints in the class as Gaussian distribution centered at \(x_i\) with the given variance \(\sigma\) (centered at point \(x_i\)). Variance is unique for each datapoint and is determined separately based on the assumption that the points with higher density have lower variance.

Now let’s describe the similarity of datapoint and datapoint correspondingly in the new space:

\(q_{ij}=\frac{(1 + \mathopen||y_i – y_j\mathclose|| ^2)^{-1}}{ \sum_{k \neq l}(1 + \mathopen||y_k – y_l\mathclose|| ^2 )^{-1}}\)

Again, since we are only interested in modeling pairwise similarities, we set \(q_{ij} = 0\).

If the map points \(y_i\) and \(y_j\) correctly model the similarity between the high-dimensional datapoints \(x_i\) and \(x_j\), the conditional probabilities \(p_{ij}\) and \(q_{ij}\) will be equal. Motivated by this observation, SNE aims to find a low-dimensional data representation that minimizes the mismatch between \(p_{ij}\) and \(q_{ij}\) .

The algorithm finds the variance for Gaussian distribution over each datapoint \(x_i\). It is not likely that there is a single value of \(\sigma_i \) that is optimal for all datapoints in the data set because the density of the data is likely to vary. In dense regions, a smaller value of \(\sigma_i \) is usually more appropriate than in sparser regions.

SNE performs a binary search for the value of . The search is performed considering a measure of the effective number of neighbors (perplexity parameter) that will be taken into account when calculating .

The authors of this algorithm found an example in physics, and describe the algorithm as a set of objects with various springs that are capable of repelling and attracting other objects. If the system is not interfered with for some time, it will find a stationary point by balancing the strain of all springs.

The difference between the SNE and t-SNE algorithm is that t-SNE uses a Student-t distribution (also known as t-Distribution, t-Student distribution) rather than a Gaussian, and a symmetrized version of the SNE cost function.

That is, at first the algorithm locates all initial objects in the lower-dimensional space. After that it moves object by object basing on the distance between them (which objects were located closer/farther) in the initial space.

There is no need to implement such algorithms yourself nowadays. You can use such ready-to-use mathematical packages as scikit, MATLAB, or TensorFlow.

In my previous article, I mentioned that the TensorFlow toolkit contains a package for data and exercising process visualization called TensorBoard. Let’s use this solution.

""" Projector realisation for data visualisation. Author: Volodymyr Pavliukevych. """ import os import numpy as np import tensorflow as tf from tensorflow.contrib.tensorboard.plugins import projector # Create datasets first_D = 23998 # Number of items (size). second_D = 11999 # Number of items (size). DATA_DIR = '' LOG_DIR = DATA_DIR + 'embedding/' # Load data from autoencoder. first_rada_input = np.loadtxt(DATA_DIR + 'result_' + str(first_D) + '/rada_full_packed.tsv', delimiter='\t') second_rada_input = np.loadtxt(DATA_DIR + 'result_' + str(second_D) + '/rada_full_packed.tsv', delimiter='\t') # Create variables. first_embedding_var = tf.Variable(first_rada_input, name='politicians_embedding_' + str(first_D)) second_embedding_var = tf.Variable(second_rada_input, name='politicians_embedding_' + str(second_D)) saver = tf.train.Saver() with tf.Session() as session: session.run(tf.global_variables_initializer()) saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), 0) config = projector.ProjectorConfig() # You can add multiple embeddings. first_embedding = config.embeddings.add() second_embedding = config.embeddings.add() first_embedding.tensor_name = first_embedding_var.name second_embedding.tensor_name = second_embedding_var.name # Link this tensor to its metadata file (e.g. labels). first_embedding.metadata_path = os.path.join(DATA_DIR, '../rada_full_packed_labels.tsv') second_embedding.metadata_path = os.path.join(DATA_DIR, '../rada_full_packed_labels.tsv') # Attach prepared bookmarks. first_embedding.bookmarks_path = = os.path.join(DATA_DIR, '../result_23998/bookmarks.txt') second_embedding.bookmarks_path = = os.path.join(DATA_DIR, '../result_11999/bookmarks.txt') # Use the same LOG_DIR where you stored your checkpoint. summary_writer = tf.summary.FileWriter(LOG_DIR) # The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will # read this file during startup. projector.visualize_embeddings(summary_writer, config)

There is another way, an entire portal called projector that allows you to visualize your dataset directly on the Google server:

- Open the TensorBoard Projector website.
- Click
**Load Data.** - Select our dataset with vectors.
- Add the metadata prepared earlier: labels, classes, etc.
- Enable color map by one of the available columns.
- Optionally, add JSON *.config file and publish data for public view.

Now you can send the link to your analyst.

Those interested in the subject domain may find useful viewing various slices, for example:

- Distribution of votes of politicians from different regions.
- Voting accuracy of different parties.
- Distribution of voting of politicians from one party.
- Similarity of voting of politicians from different parties.

- Autoencoders represent a range of simple algorithms that give surprisingly quick and good convergence result.
- Automatic clustering does not answer the question about the nature of the initial data and requires further analysis; however, it provides a quick and clear vector that allows you to start working with your data.
- TensorFlow and TensorBoard are powerful and fast-evolving tools for machine learning that allow solving tasks of diverse complexity.

When I started working in the field of machine learning, it was quite difficult to move to vectors and spaces from objects and their behavior. At first it was rather complicated to wrap my head around all that, and most processes did not seem obvious and clear at once. That’s the reason why I did my best to visualize everything I did in my groundwork: I used to create 3D models, graphs, diagrams, figures, etc.

When speaking about efficient development of machine learning systems, usually such problems as learning speed control, learning process analysis, gathering various learning metrics, and others are mentioned. The major difficulty is that we (people) use 2D and 3D spaces to describe various processes that take place around us. However, processes within neural networks lay in multidimensional spaces, and that makes them rather difficult to understand. Engineers all around the world understand this problem and try to develop various approaches to the visualization or conversion of multidimensional data into simpler and more understandable forms.

There are separate communities dedicated to solving such problems, for example, Distill, Welch Labs, 3Blue1Brown.

Before I started working with TensorFlow, I used the TensorBoard package. It turned out to be a handy cross-platform solution for visualizing different kinds of data. I spent a couple of days “teaching” the Swift application to create reports in the TensorBoard format and integrate them into my neural network.

Development of TensorBoard started in the middle of 2015 in one of the Google laboratories. In the end of 2015 Google opened the source code and the project became an open source one.

The current version of TensorBoard is a Python package created using TensorFlow, and it allows visualization of the following kinds of data:

- Scalar data in time stack with the smoothing option
- Images in case you can represent your data in 2D, for example, convolutional network weights (filters)
- Actual computational graph (as an interactive view)
- 2D modifications of tensor values over time
- 3D histogram-modification of data allocation within tensor over time
- Text
- Audio

Besides, there is a projector and a possibility to extend TensorBoard using plugins, but that is a topic for another article.

You need to install TensorBoard on your computer (Ubuntu or Mac) to get started.

Also, you need to install Python 3. I recommend to install TensorBoard as a part of the TensorFlow package for Python.

Linux: $ sudo apt-get install python3-pip python3-dev $ pip3 install tensorflow MacOS: $ brew install python3 $ pip3 install tensorflow

Now run TensorBoard after specifying a directory for storing reports:

$ tensorboard --logdir /tmp/example/

Let’s open http://localhost:6006/.

Example with GitHub. Remember to like my repository.

Now let’s get a view of some cases using an example. Reports (summaries) in the TensorBoard format are created when the computational graph is being built. In TensorFlowKit, I did my best to duplicate Python approaches and interface, so that it would be possible to use shared documentation in the future. As I mentioned earlier, each report is added into a summary. It’s a container holding a value array, each of which representing an event we need to visualize. Later on, summary is saved to a file in the file system, where TensorBoard is able to read it.

So, we need to create FileWriter after specifying the graph we are going to visualize, and create a summary that will hold our values.

let summary = Summary(scope: scope) let fileWriter = try FileWriter(folder: writerURL, identifier: "iMac", graph: graph)

After running the application and refreshing the page we will see the graph we’ve built in the code. It will be interactive, so we can navigate it.

Also, we want to see changes of some scalar value over time, for example, the value of the loss function and the accuracy of our neural network. To do that, let’s add output of the operations to the summary:

try summary.scalar(output: accuracy, key: "scalar-accuracy") try summary.scalar(output: cross_entropy, key: "scalar-loss")

So, after each step of calculations of our session, TensorFlow automatically subtracts values of our operations and passes them to the input of the resulting summary that will be saved in FileWriter (I will tell you how to do that a bit later).

There is a lot of weights and biases in our neural network. Usually these are various high dimension matrices, and it is quite difficult to analyze their values by displaying them (printing out). It’s better to create a distribution diagram. Also, let’s add information on the weights change value that is made by our network during the learning process into our Summary.

try summary.histogram(output: bias.output, key: "bias") try summary.histogram(output: weights.output, key: "weights") try summary.histogram(output: gradientsOutputs[0], key: "GradientDescentW") try summary.histogram(output: gradientsOutputs[1], key: "GradientDescentB")

Now we have a visualization of weights changes and the changes during the learning process.

However, that is not all. Let’s take a look at the organization of our neural network. Each figure with handwriting received on input finds some reflection in the corresponding weights. That means the input figure can activate certain neurons, and this way leaves some mark in our network. Let me remind you that we have 784 weights for each neuron out of 10. So, we’ve got 7840 weights. All of them are represented as a 784×10 matrix. Let’s try to evolve the matrix into a vector and after that extract the weights that correspond to each class.

let flattenConst = try scope.addConst(values: [Int64(7840)], dimensions: [1], as: "flattenShapeConst") let imagesFlattenTensor = try scope.reshape(operationName: "FlattenReshape", tensor: weights.variable, shape: flattenConst.defaultOutput, tshape: Int64.self) try extractImage(from: imagesFlattenTensor, scope: scope, summary: summary, atIndex: 0) try extractImage(from: imagesFlattenTensor, scope: scope, summary: summary, atIndex: 1) … try extractImage(from: imagesFlattenTensor, scope: scope, summary: summary, atIndex: 8) try extractImage(from: imagesFlattenTensor, scope: scope, summary: summary, atIndex: 9)

To do that, let’s add a couple of operations to our graph: *stridedSlice* and *reshape*.

Now let’s add each vector we get into the Summary as an image.

try summary.images(name: "Image-\(String(index))", output: imagesTensor, maxImages: 255, badColor: Summary.BadColor.default)

In the Images section of TensorBoard, we can see the weights’ “marks” as they used to be during the learning process.

Now let’s process our Summary. To do that, we need to join all created Summaries into one, and process it while exercising the network.

let _ = try summary.merged(identifier: "simple")

While the network works:

let resultOutput = try session.run(inputs: [x, y], values: [xTensorInput, yTensorInput], outputs: [loss, applyGradW, applyGradB, mergedSummary, accuracy], targetOperations: []) let summary = resultOutput[3] try fileWriter?.addSummary(tensor: summary, step: Int64(index))

*Please keep in mind that I did not consider the problem of the accuracy calculation, as it is calculated basing on the learning data. It is not correct to calculate it basing on the data for exercising.*

In the next article I will explain how to build one neural network and launch it on Ubuntu, MacOS, iOS from one source. * *

]]>

Please, read my previous post about Swift & TensorFlow

I took “Hello World!” in the universe of neural networks as an example, a task for systematization of MNIST images. MNIST dataset includes thousands of images of handwritten numbers, the size of each image is 28×28 pixels. So, we have ten classes that are neatly divided into 60 000 images for educating and 10 000 images for testing. Our task is to create a neural network that is able to classify an image and determine the class it belongs to (out of 10 classes).

Before you can start working with TensorFlowKit, you need to install TensorFlow On Mac OS, you can use the *brew* package manager:

$ brew install libtensorflow

Assembly for Linux is available here.

Let’s create a Swift project and add a dependency:

dependencies: [ .package(url: "https://github.com/Octadero/TensorFlow.git", from: "0.0.7") ]

Now we should prepare the MNIST dataset.

I have written a Swift package for working with the MNIST dataset that you can find here. This package will download the dataset to a temporary folder, unpack it, and represent it as ready-to-use classes.

For example:

dataset = MNISTDataset(callback: { (error: Error?) in print("Ready") })

Now let’s create the required operation graph.

The space and subspace of the calculation graph is called scope and can have its own name. We’ll provide two vectors for the network input. The first one contains the images represented as a 784 high-dimension vector (28×28 px). So, each component of the *x* vector will contain a Float from 0.0-1.0 value that corresponds to the color of the pixel on the image. The second vector will be an encrypted matching class (see below), where the corresponding component 1 matches the class number. In the following example it’s class 2.

[0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ]

As input parameters will change during the educative process, let’s create a placeholder to refer to them.

/// Input sub scope let inputScope = scope.subScope(namespace: "input") let x = try inputScope.placeholder(operationName: "x-input", dtype: Float.self, shape: Shape.dimensions(value: [-1, 784])) let yLabels = try inputScope.placeholder(operationName: "y-input", dtype: Float.self, shape: Shape.dimensions(value: [-1, 10]))

That’s how Input looks on the graph:

That is our input layer. Now let’s create weights (connections) between the input and hidden layer.

let weights = try weightVariable(at: scope, name: "weights", shape: Shape.dimensions(value: [784, 10])) let bias = try biasVariable(at: scope, name: "biases", shape: Shape.dimensions(value: [10]))

We will create a variable operation in the graph, because the weights and bases will be customized during the educative process. Let’s initialize them using a tensor filled with nulls.

Now let’s create a hidden layer that will perform such primitive operation as *(x * W) + b*. This operation multiplies vector *x* (dimension 1×784) by matrix *W* (dimension 784×10) and adds basis.

In our case the hidden layer is the output layer (the task of the *“Hello World!”* level), that’s why we need to analyze the output signal and decide the winner. To do that, we should use the softmax operation.

I suggest to take our neural network as a complicated function in order to better understand what I will be talking about hereafter. We input vector *x* (representing the image) to our function. In the output we get a vector that shows the probability of the input vector belonging to each of the available classes.

Now let’s take a natural logarithm of the received probability for each class and multiply it by the value of the vector of the right class neatly passed in the very beginning (yLabel). This way we will get the error value and use it to “judge” the neural network. The figure below demonstrates two samples. In the first sample, for class 2 the error value is 2.3, and in the second sample, for class 1 the error value is 0.

let log = try scope.log(operationName: "Log", x: softmax) let mul = try scope.mul(operationName: "Mul", x: yLabels, y: log) let reductionIndices = try scope.addConst(tensor: Tensor(dimensions: [1], values: [Int(1)]), as: "reduction_indices").defaultOutput let sum = try scope.sum(operationName: "Sum", input: mul, reductionIndices: reductionIndices, keepDims: false, tidx: Int32.self) let neg = try scope.neg(operationName: "Neg", x: sum) let meanReductionIndices = try scope.addConst(tensor: Tensor(dimensions: [1], values: [Int(0)]), as: "mean_reduction_indices").defaultOutput let cross_entropy = try scope.mean(operationName: "Mean", input: neg, reductionIndices: meanReductionIndices, keepDims: false, tidx: Int32.self)

If talking mathematical language, we have to minimize the target function. To do that, the gradient descent method can be used. If it may become necessary, I will try to describe this method in another article.

So, we should calculate how to correct each of the weighs (components of the *W* matrix) and the basis vector *b*, so that the neural network would make smaller error when receiving similar input data. In the context of math, we should find the partial derivatives of the output node by the values of all intermediate nodes. The symbolic gradients we’ve got allow us to “move” the values of the* W* and *b* variables according to the extent it affected the result of the previous calculations.

**TensorFlow Magic**

The thing is that TensorFlow can perform all (however, not all at the very moment) these complicated calculations automatically by analyzing the graph we created.

let gradientsOutputs = try scope.addGradients(yOutputs: [cross_entropy], xOutputs: [weights.variable, bias.variable])

After this operation call, TensorFlow will create about fifty more operations.

Now it is enough to add an operation for updating the weights to the value we received earlier using the gradient descent method.

let _ = try scope.applyGradientDescent(operationName: "applyGradientDescent_W", `var`: weights.variable, alpha: learningRate, delta: gradientsOutputs[0], useLocking: false)

That’s it – the graph is ready!

As I said, TensorFlow separates the model and calculations. That’s why the graph we created is only a model for performing calculations. We can use Session to start the calculation process. Let’s prepare data from the dataset, place it to tensors, and run the session.

guard let dataset = dataset else { throw MNISTTestsError.datasetNotReady } guard let images = dataset.files(for: .image(stride: .train)).first as? MNISTImagesFile else { throw MNISTTestsError.datasetNotReady } guard let labels = dataset.files(for: .label(stride: .train)).first as? MNISTLabelsFile else { throw MNISTTestsError.datasetNotReady } let xTensorInput = try Tensor(dimensions: [bach, 784], values: xs) let yTensorInput = try Tensor(dimensions: [bach, 10], values: ys)

It is necessary to run the session several times to let it recalculate the value several times.

for index in 0..<1000 { let resultOutput = try session.run(inputs: [x, y], values: [xTensorInput, yTensorInput], outputs: [loss, applyGradW, applyGradB], targetOperations: []) if index % 100 == 0 { let lossTensor = resultOutput[0] let gradWTensor = resultOutput[1] let gradBTensor = resultOutput[2] let wValues: [Float] = try gradWTensor.pullCollection() let bValues: [Float] = try gradBTensor.pullCollection() let lossValues: [Float] = try lossTensor.pullCollection() guard let lossValue = lossValues.first else { continue } print("\(index) loss: ", lossValue) lossValueResult = lossValue print("w max: \(wValues.max()!) min: \(wValues.min()!) b max: \(bValues.max()!) min: \(bValues.min()!)") } }

The error range is shown after every 100 operations. In the next article, I will tell you how to calculate the accuracy of our network and how to visualize it using the means of TensorFlowKit.

]]>I think it is not necessary to explain the meaning of such terms as machine learning and artificial intelligence in 2017. You can find a lot of op-ed articles and research papers on this topic. So, I assume that the reader is familiar with the topic and knows definitions of basic terms. When talking about machine learning, data scientists and software engineers usually mean deep neural networks that became quite popular because of their productivity. So far there are many software solutions and packages for solving artificial neural networks tasks: Caffe, TensorFlow, Torch, Theano(rip), cuDNN, etc.

Swift is an innovative protocol-oriented open source programming language written within Apple by Chris Lattner (who recently left Apple and, after SpaceX, settled down in Google).

Apple OS already features different libraries for working with matrices and vector algebra, such as BLAS, BNNS, DSP, that were later on gathered in the single Accelerate library.

In 2015, small-scale solutions based on the Metal graphics technology for implementing math appeared.

In 2016, CoreML was introduced:

CoreML can import a finished and trained model (CaffeV1, Keras, scikit-learn) and allows developer to export it to an application.

So, in the first place, you need to prepare a model on another platform using the Python or C++ language and third-party frameworks. Second, you need to educate it using a third-party hardware based solution.

Only after that you can import it and start working with the Swift language. As for me, it all seems too complicated.

TensorFlow, as well as other software packages that implement artificial neural networks, provides a lot of prepared abstractions and mechanisms for working with processing elements, connections between them, error evaluation, and backpropagation. However, the difference between TensorFlow and other packages is that Jeff Dean (Google employee, author of DFS, TensorFlow, and many other wonderful solutions) decided to embed the idea of splitting data execution model and data execution process into TensorFlow. It means that in the first place you need to describe the so-called computation graph, and after that start the calculation process. Such approach allows splitting and adding flexibility to working with the data execution **model** and the data execution **process** itself by dividing execution by different units (processors, video cards, computers, and clusters).

To solve all mentioned tasks, starting from preparing a model and up to working with it in an ultimate application, I have written an interface that provides access to and allows working with TensorFlow using a single language.

The solution architecture has two levels: medium and high.

- On the low-level, the C module allows communicating with libtensorflow using the Swift language.
- On the medium-level, you can move from using C pointers and work with “comprehensible errors”.
- The high-level implements different abstractions used to access model elements and various utilities for exporting, importing, and visualizing graphs.

This way you can create a model (calculation graph) using Swift, educate it on a server running on Ubuntu OS using several video cards, and after that easily open it in your application running on Mac OS or tvOS. Development can be performed using familiar Xcode with all its virtues and shortcomings.

Artificial neural networks implement a model that resembles a simplified model of neural connections in the neural system tissues. Input signal in the form of a high dimension vector reaches the input layer that consists of processing elements. After that each input processing element transforms the signal basing on the connection properties (weights) between the processing elements and properties of the processing elements of the following layers, and passes the signal to the next layer. During the educative process, the pickup signal is generated and compared with the expected one. Basing on the differences between the actual pickup signal and expected signal, the error rate is determined. Later on, this error is used to calculate the so-called grade. The grade is a vector in the direction of which you need to correct connections between processing elements to make the network produce signals similar to the expected ones in the future. This process is called backpropagation. Therefore, processing elements and connections between them accumulate information necessary for generalizing properties of the data model that the current neural network exercises at the moment. Technical implementation comes to different math operations on matrices and vectors, that, in turn, have already been implemented to a certain extent by such solutions as BLAS, LAPACK, DSP, etc.

In next article I will tell you how to use TensorFlowKit to resolve MNIST task.

]]>TensorFlowKit is an Octadero Swift Package which allows developers to simply and easily integrate TensorFlow machine learning models into apps running on macOS and Ubuntu OS.

API based on TensorFlow library.

- System modules:
- CTensorFlow is C API system module;
- CCTensorFlow is C++ API system module;
- CProtobuf is protobuf library system module;

- Low-level:
- Helper tool:
- OpPruducer – Swift writen command line tool to produce new TensorFlow Operations

- High-level:
- TensorFlowKit – Swift writen high-level API;

All documentation available by links bellow:

- CAPI – Swift writen low-level API to C library;
- Proto – Swift auto – generated classes for TensorFlow structures and models;
- OpPruducer – Swift writen command line tool to produce new TensorFlow Operations
- TensorFlowKit – Swift writen high-level API;

Sources at: GitHub

]]>