All posts in: GPU computing

Forward propagation as well as backpropagation leads to some operations on matrixes. The most common one is a matrix multiplication. In order to perform matrix multiplication in reasonable time you will need to optimise your algorithms.

There is a simple way to do it on macOS by means of their Accelerate Framework . Actually this is an umbrella framework for vector-optimized operations:

  • vecLib.framework – Contains vector-optimized interfaces for performing math, big-number, and DSP calculations, among others.
  • vImage.framework – Contains vector-optimized interfaces for manipulating image data.

Cblas_sgemm function can help you reach really hight performance.

Actually, vecLib  is only a ported version of two libs BLAS and LAPACK.

cblas.h and vblas.h are the interfaces to Apple’s implementations of BLAS. You can find reference documentation in BLAS. Additional documentation on the BLAS standard, including reference implementations, can be found on the web starting from the BLAS FAQ page at these URLs: and

clapack.h is the interface to Apple’s implementation of LAPACK. Documentation of the LAPACK interfaces, including reference implementations, can be found on the web starting from the LAPACK FAQ page at this URL:

This is a good way to combine your code with C++ library on Linux and macOS platforms.
Read More

While working on GPU computing, I started wondering how much GPU memory my code uses.

It turned out that it is difficult to calculate how much of the GPU memory is available and how much is used  in the new macOS Sierra.

You might think that it is as simple as go to the list of devices and then to “PerformanceStatistics” which holds current parameters of the device.

Read More

In this post I would like to cover some of the difficulties that a developer, working on GPU computing, might encounter.

Here is some background fist.

There exists a nonprofit organisation The Khronos Group that creates open standards for cross-platform technologies which are then used by all the major players, like Apple, Nvidia, ATI, Intel, ARM, etc.  Khronos itself doesn’t develop any software, it’s just a medium for discussions of standards for different technologies and is a kind of mediator.

After the members of Khronos agree on the API, each of the hardware manufacturers starts implementing this API for its platform. When all is ready and Khronos certifies it, a manufacturer can indicate that they support this technology on their product. This indication is a sign to developers, who are interested in it, that they can start using this API for their programs or games.

Khronos’s biggest projects are OpenGL, OpenCL, WebGL, WebCL and many others (Vulkan®, COLLADA™, glTF™, EGL™, OpenSL ES™, OpenMAX™, SPIR™, SYCL™, NNEF™, OpenVX™, Safety Critical, OpenKCam™, OpenVG™, Data Format).

Read More

This post is the first one in the series of posts where I will describe the creation of a neural network, modelling the behaviour of a small organism. Why model such a system, what functions it will perform and what the results I will get will be described in the next posts.

The only thing I want to describe in this post is the solution which helps accelerate the training of neural networks.

Read More