Deep neural networks (DNNs) have been the focus of much research and development in the last few years as the uptake of artificial intelligence in a number of application areas has grown rapidly. Although a lot of research has been carried out in developing new deep learning models and techniques, making deep learning models computationally affordable and accessible is still a challenge.

One way to accelerate computation in a deep neural network is to use less precision for computation. This is called quantization (Hubara et al., 2018). In deep learning, quantization is a technique to reduce memory consumption as well as the computation time of deep neural networks. In contrast, floating-point operations are slower and more costly (in terms of power consumption and the required area in a silicon chip) compared to fixed-point and integer operations. For instance, in a 45nm process, 32-bit integer multiplication and addition take 3.1 pJ (pico joules) and 0.1pJ, respectively (Horowitz, 2014). However, to do the same operation with floating-point values, it requires 3.7 pJ for multiplication and 0.9 pj for addition. On the other hand, using integer operands make the computation process faster. As an example in Intel Core i7 4770 running at 3.4GHz multiplication is more than 3 times faster for fixed-point data types compared to floating-point datatypes (LIMARE, LIMARE).

To benefit from quantization in a neural network, one must use a hardware that supports low precision computation. At the time of writing this documentation, there are no commercially available general processors (CPU or GPU) that can efficiently store and load sub-8-bit parameters of a neural network. Also, the general processors are not equipped with customized hardware to perform arbitrary precision computation. Hence, to fully benefit from quantization, one should consider designing custom ASICS. This document provides technical details for BARVINN: a Barrel RISC-V Neural Network Accelerator Engine. The main purpose of designing BARVINN was to fill the need for arbitrary precision computation in neural networks.

This documentation tries to help users and developers use BARVINN in their projects or improve it depending on their custom computation needs.