I think it is not necessary to explain the meaning of such terms as machine learning and artificial intelligence in 2017. You can find a lot of op-ed articles and research papers on this topic. So, I assume that the reader is familiar with the topic and knows definitions of basic terms. When talking about machine learning, data scientists and software engineers usually mean deep neural networks that became quite popular because of their productivity. So far there are many software solutions and packages for solving artificial neural networks tasks: Caffe, TensorFlow, Torch, Theano(rip), cuDNN, etc.
Swift is an innovative protocol-oriented open source programming language written within Apple by Chris Lattner (who recently left Apple and, after SpaceX, settled down in Google).
Apple OS already features different libraries for working with matrices and vector algebra, such as BLAS, BNNS, DSP, that were later on gathered in the single Accelerate library.
In 2015, small-scale solutions based on the Metal graphics technology for implementing math appeared.
In 2016, CoreML was introduced:
CoreML can import a finished and trained model (CaffeV1, Keras, scikit-learn) and allows developer to export it to an application.
So, in the first place, you need to prepare a model on another platform using the Python or C++ language and third-party frameworks. Second, you need to educate it using a third-party hardware based solution.
Only after that you can import it and start working with the Swift language. As for me, it all seems too complicated.
TensorFlow, as well as other software packages that implement artificial neural networks, provides a lot of prepared abstractions and mechanisms for working with processing elements, connections between them, error evaluation, and backpropagation. However, the difference between TensorFlow and other packages is that Jeff Dean (Google employee, author of DFS, TensorFlow, and many other wonderful solutions) decided to embed the idea of splitting data execution model and data execution process into TensorFlow. It means that in the first place you need to describe the so-called computation graph, and after that start the calculation process. Such approach allows splitting and adding flexibility to working with the data execution model and the data execution process itself by dividing execution by different units (processors, video cards, computers, and clusters).
To solve all mentioned tasks, starting from preparing a model and up to working with it in an ultimate application, I have written an interface that provides access to and allows working with TensorFlow using a single language.
The solution architecture has two levels: medium and high.
- On the low-level, the C module allows communicating with libtensorflow using the Swift language.
- On the medium-level, you can move from using C pointers and work with “comprehensible errors”.
- The high-level implements different abstractions used to access model elements and various utilities for exporting, importing, and visualizing graphs.
This way you can create a model (calculation graph) using Swift, educate it on a server running on Ubuntu OS using several video cards, and after that easily open it in your application running on Mac OS or tvOS. Development can be performed using familiar Xcode with all its virtues and shortcomings.
Neural Networks Theory in Brief
Artificial neural networks implement a model that resembles a simplified model of neural connections in the neural system tissues. Input signal in the form of a high dimension vector reaches the input layer that consists of processing elements. After that each input processing element transforms the signal basing on the connection properties (weights) between the processing elements and properties of the processing elements of the following layers, and passes the signal to the next layer. During the educative process, the pickup signal is generated and compared with the expected one. Basing on the differences between the actual pickup signal and expected signal, the error rate is determined. Later on, this error is used to calculate the so-called grade. The grade is a vector in the direction of which you need to correct connections between processing elements to make the network produce signals similar to the expected ones in the future. This process is called backpropagation. Therefore, processing elements and connections between them accumulate information necessary for generalizing properties of the data model that the current neural network exercises at the moment. Technical implementation comes to different math operations on matrices and vectors, that, in turn, have already been implemented to a certain extent by such solutions as BLAS, LAPACK, DSP, etc.
In next article I will tell you how to use TensorFlowKit to resolve MNIST task.
Author: Volodymyr Pavliukevych
Senior Software Engineer, Data Scientist.