Authors / Contributors: Andrea Vedaldi, Mathias Lux, Marco Bertini;
Affiliation: University of Oxford, University of Klagenfurt, University of Florence;
Editors: Mathias Lux and Marco Bertini
MatConvNet is an open source MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision and multimedia applications, developed by the same authors of the famed VLFeat library. Both libraries have associated papers that have been presented within the Open Source Software Competition track of ACM Multimedia: “MatConvNet: Convolutional Neural Networks for MATLAB” [1] and “Vlfeat: an open and portable library of computer vision algorithms” [2]. At present, the MatConvNet paper is the second most cited ACM Multimedia paper, according to Google Scholar.
The pros of this toolbox is its simplicity, thanks to the integration with MATLAB environment, efficiency thanks to the use of GPUs and CuDNN libraries, and the fact that it can run and learn state-of-the-art CNNs. After all it has been developed by the same people that brought VGG-16 and VGG-19 CNNs…
Many pre-trained CNNs for image classification (e.g. ResNet), segmentation, face recognition (e.g. VGG Face), object (e.g. Fast-R CNN) and text detection (e.g. VGG Text) are available in the model zoo.
Introduction
Since the 2015 MatConvNet paper, quite some things have changed in the landscape of libraries and frameworks for deep learning. TensorFlow, PyTorch, MXNet and Microsoft Cognitive Toolkit were not even released at the time, and of the competing libraries cited in the papers basically only Caffe is still actively developed.
However, the main motivation of the development of MatConvNet still holds: this toolbox provides an ease of use that is still unparalleled by its competitors thanks to the integration with MATLAB environment. It is possible to download, compile and run VGG-16 CNN on an image with 10 lines of code. Adding some visualization of results is another 3 lines.
Therefore, if using Python bindings of one of the aforementioned frameworks is not your cup of tea, or if you are scared to compile a GPU-enabled library modifying Makefiles or relatively arcane build tools (e.g. read how to compile GPU support for TensorFlow on macOS), or simply if you still prefer the MATLAB environment to prototype your research, MatConvNet is the library for you.
It has to be noted that while from a user perspective MatConvNet currently relies on MATLAB, the library is actually developed with a clean separation between MATLAB code and the C++ and CUDA core. This means that in the future the library may be extended to allow processing CNNs independently of MATLAB. Anyway, the integration and the fact that MatConvNet exposes CNN building blocks such as convolution, normalization’s and pooling as MATLAB commands, it allows to write new CNN blocks in MATLAB, exploiting its capabilities of CPU and GPU parallelization, thus allowing rapid development of new CNN architectures and components.
Getting started
Installing MatConvNet is much easier than other libraries: it is necessary to have MATLAB and a compatible C++ compiler. This make the library very suitable as a learning tool. Of course, a GPU and related CUDA toolkit (and associated CuDNN library) is warmly recommended.
Compilation just requires this line:
vl_compilenn('enableGpu', true, 'cudaRoot', '/Developer/NVIDIA/CUDA-9.0', 'cudaMethod', 'nvcc', ... 'enableCudnn', true, 'cudnnRoot', '/Developer/NVIDIA/CUDA-9.0') ;
This line tells to compile with GPU support, indicating where CUDA Toolkit and CuDNN are installed (in this case CuDNN header is installed within the CUDA include dir, and CuDNN libs are in the CUDA lib dir). It is also interesting to note that it is possible to compile using a CUDA toolkit that is not directly supported by MATLAB (e.g. MATLAB 2017a supports CUDA 8.0). Testing the resulting library (it will take only 1-2 minutes to compile) takes this line:
vl_testnn('gpu', true)
To add MatConvNet to the MATLAB path, use:
run /path/to/matconvnet/matlab/vl_setupnn
With these 3 lines you have a running CNN library ready for use.
Let’s try immediately to use a pre-trained VGG-16 model to classify the content of an image:
net = load('/path/to/matconvnet-models/imagenet-vgg-verydeep-16.mat'); % update an old model format to the current MatConvNet format: net = vl_simplenn_tidy(net);
Get and pre-process an image, resizing and subtracting the ImageNet average as done when training VGG-16:
% read and preprocess an image. im = imread('peppers.png') ; im_ = single(im) ; % note: 255 range im_ = imresize(im_, net.meta.normalization.imageSize(1:2)) ; im_ = im_ - net.meta.normalization.averageImage ;
Then simply run the network with:
res = vl_simplenn(net, im_) ;
Writing the best score and showing the original image is a matter of few lines, and this is the moment where MATLAB really helps, apart from the nice capability to inspect the layers of the network and all the variables in the Workspace panel:
scores = squeeze(gather(res(end).x)) ; [bestScore, best] = max(scores) ; figure(1) ; clf ; imagesc(im) ; title(sprintf('%s (%d), score %.3f', net.meta.classes.description{best}, best, bestScore)) ;
Resulting in this nice image:
Running more modern networks (e.g. GoogLeNet, ResNet, etc.) require to use the DAG NN model type, rather than the simple NN type seen above. For example to perform object detection using Fast R-CNN:
net = load('/path/to/matconvnet-models/fast-rcnn-vgg16-pascal07-dagnn.mat') ; net = dagnn.DagNN.loadobj(net); net.mode = 'test' ;
Prepare the image for processing, as well candidate bounding boxes:
% Load a test image and candidate bounding boxes. im = single(imread('000004.jpg')) ; imo = im; % keep original image for plotting final results boxes = load('000004_boxes.mat') ; boxes = single(boxes.boxes') + 1 ; boxeso = boxes - 1; % keep original boxes for plotting final results
% Resize images and boxes to a size compatible with the network. imageSize = size(im) ; fullImageSize = net.meta.normalization.imageSize(1) / net.meta.normalization.cropSize ; scale = max(fullImageSize ./ imageSize(1:2)) ; im = imresize(im, scale, net.meta.normalization.interpolation, 'antialiasing', false) ; % Remove the average color from the input image: alternative method w.r.t. previous example imNorm = bsxfun(@minus, im, net.meta.normalization.averageImage) ; boxes = bsxfun(@times, boxes - 1, scale) + 1 ;
% Convert boxes into ROIs by prepending the image index. There is only one image in this batch... rois = [ones(1,size(boxes,2)) ; boxes] ;
Run the network and get class probabilities and bounding box results (as refinements):
net.eval({'data', imNorm, 'rois', rois});
% Extract class probabilities and bounding box refinements probs = squeeze(gather(net.vars(net.getVarIndex('cls_prob')).value)) ; deltas = squeeze(gather(net.vars(net.getVarIndex('bbox_pred')).value)) ;
The code to plot the bounding boxes and writing probabilities and boxes for each class is about 20 lines, and for the sake of space is not reported here (check it in the MatConvNet demo code), anyway the resulting image is shown here:
The network and image used in this example are kept in host memory; to use the GPU it is enough to copy the image array in GPU using MATLAB gpuArray, and moving the network to GPU as well, with the following code:
imNorm = gpuArray(imNorm) ; % copy imNorm to GPU memory rois = gpuArray(rois) ; net.move('gpu') ; % move CNN to GPU
Training CNNs from scratch is also quite easy, and efficient too, using vl_imreadjpeg() function, that is an optimized and parallelized code that will provide a sustained stream of images to avoid starving the GPU.
Conclusions
In this column, we provided an overview of the MatConvNet library, showing how easy it is to perform image analysis tasks. The library provides a first-class experience to MATLAB users, and its ease of installation makes it very suitable as a learning tool. Researchers that still prefer to use MATLAB instead of Python, because they prefer the experience of this nicely integrated environment, should definitely check it.
References
[1] VEDALDI A., KAREL L., “MatConvNet: Convolutional Neural Networks for MATLAB”, In: ACM International Conference on Multimedia (ACM MM), 2015
[2] VEDALDI A., FULKERSON B., “Vlfeat: an open and portable library of computer vision algorithms”, In: ACM International Conference on Multimedia (ACM MM), 2010