SA-MIRI

numsection VGG

VGG-19 is a deep convolutional network for object recognition developed and trained by Oxford's renowned Visual Geometry Group (VGG), which achieved very good performance on the ImageNet dataset. You can check Karen Simonyan and Andrew Zisserman publication: Very Deep Convolutional Networks for Large-Scale Image Recognition.

numsection CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

In this lab, we will train the VGG-19 network using the CIFAR-10 dataset with Keras.

numsection Run VGG network on your computer

task: Open the Jupyter notebook with docker and open the keras/vgg-book file.

Run the file, wait the code to perform a few steps and record the remaining time info that Keras offers, then you can stop the execution. If you can not, kill the container.Please compelte the following table about your computer and the execution (Currently filled with an example):

Processor (Ghz / Cores / Threads)	Step	Remaining time	Accuracy
Intel Core i7 5500U (2.6Ghz / 2 Cores / 4 Threads)	96/50000	42h 36m 21s	0.0312

Hint: If the execution fails, check the Docker docs in order to increase the RAM of your container.

numsection Preparing files for MinoTauro

task: Download the following repository to your computer:

https://github.com/jorditorresBCN/dlaimet

task: Copy the Keras folder to your MinoTauro Home.

task: Download the following file:

http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

task: Uncompress it and then, create a new folder called cifar-10 inside the Keras folder on your MinoTauro Home and move all the uncompressed data to this folder.

task: On the Keras folder, modify the vgg.py file (Update the DATASET_DIR value):

DATASET_DIR = "cifar-10"

task: To run Python on MinoTauro, you have to load a few modules in your job file. To load the modules write:

module purge; module load K80 cuda/8.0 mkl/2017.1 CUDNN/5.1.10-cuda_8.0 intel-opencl/2016 python/3.6.0+_ML

numsection Comparing speeds

Previous considerations:

You can only ask for 2 or 4 GPUs (0, 1 and 3 are incorrect configurations). If you want, you do not have to use all GPUs.
In these experiments the numer of cores is not relevant, we will put 16 cores in every job file.

We will use the K80 nodes:

#SBATCH --constraint=k80
#SBATCH --gres: gpu:NUMBER_OF_GPUS

In the vgg.py file, the flag --num_gpu controls the number of graphics cards that will be used during the training. Modifiy it depending of the number of GPUs that you have and you need.

task: Complete the following table:

Nodes	GPUs asked / GPUs used	Cores	Job time wall	Step
1	2 / 1	16	15 min	?/50000
1	2 / 2	16	15 min	?/50000
1	4 / 4	16	15 min	?/50000

task: Create a used GPUs vs step bar chart, used GPUs vs remaining time bar chart and a used GPUs vs accuracy chart. Explain what conclusions can be obtained from the charts.

Hint: Example job file

#!/bin/bash
#SBATCH --job-name=keras_k80
#SBATCH -D .
#SBATCH --output=k80_%j.out
#SBATCH --error=k80_%j.err
#SBATCH --ntasks=1
#SBATCH --gres gpu:2
#SBATCH --cpus-per-task=8
#SBATCH --constraint=k80
#SBATCH --time=00:15:00

module purge; module load K80 cuda/8.0 mkl/2017.1 CUDNN/5.1.10-cuda_8.0 intel-opencl/2016 python/3.6.0+_ML
python vgg.py --num_gpu=1

Hint: Add this to run a job using the reservation queue

#SBATCH --reservation=YOUR_RESERVATION

Hint: Add this to run a job using the debug queue

#SBATCH --partition=debug
#SBATCH --qos=debug

numsection Lab report

If you don’t have time to finish all tasks during this lab session, please, follow the indications of your teacher about how to create your lab report and how to submit it.

10. Run a DNN on a GPU cluster (MinoTauro)