numsection VGG

VGG-19 is a deep convolutional network for object recognition developed and trained by Oxford's renowned Visual Geometry Group (VGG), which achieved very good performance on the ImageNet dataset. You can check Karen Simonyan and Andrew Zisserman publication: Very Deep Convolutional Networks for Large-Scale Image Recognition.

numsection CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

In this lab, we will train the VGG-19 network using the CIFAR-10 dataset with Keras.

numsection Run VGG network on your computer

task: Open the Jupyter notebook with docker and open the keras/vgg-book file.

Run the file, wait the code to perform a few steps and record the remaining time info that Keras offers, then you can stop the execution. If you can not, kill the container.Please compelte the following table about your computer and the execution (Currently filled with an example):

Processor (Ghz / Cores / Threads) Step Remaining time Accuracy
Intel Core i7 5500U (2.6Ghz / 2 Cores / 4 Threads) 96/50000 42h 36m 21s 0.0312

Hint: If the execution fails, check the Docker docs in order to increase the RAM of your container.

numsection Preparing files for MinoTauro

task: Download the following repository to your computer:
task: Copy the Keras folder to your MinoTauro Home.
task: Download the following file:
task: Uncompress it and then, create a new folder called cifar-10 inside the Keras folder on your MinoTauro Home and move all the uncompressed data to this folder.
task: On the Keras folder, modify the file (Update the DATASET_DIR value):
DATASET_DIR = "cifar-10"
task: To run Python on MinoTauro, you have to load a few modules in your job file. To load the modules write:
module purge; module load K80 cuda/8.0 mkl/2017.1 CUDNN/5.1.10-cuda_8.0 intel-opencl/2016 python/3.6.0+_ML

numsection Comparing speeds

Previous considerations:

  • You can only ask for 2 or 4 GPUs (0, 1 and 3 are incorrect configurations). If you want, you do not have to use all GPUs.
  • In these experiments the numer of cores is not relevant, we will put 16 cores in every job file.
  • We will use the K80 nodes:
    #SBATCH --constraint=k80
    #SBATCH --gres: gpu:NUMBER_OF_GPUS
  • In the file, the flag --num_gpu controls the number of graphics cards that will be used during the training. Modifiy it depending of the number of GPUs that you have and you need.
task: Complete the following table:
Nodes GPUs asked / GPUs used Cores Job time wall Step Remaining time Accuracy
1 2 / 1 16 15 min ?/50000
1 2 / 2 16 15 min ?/50000
1 4 / 4 16 15 min ?/50000
task: Create a used GPUs vs step bar chart, used GPUs vs remaining time bar chart and a used GPUs vs accuracy chart. Explain what conclusions can be obtained from the charts.

Hint: Example job file

#SBATCH --job-name=keras_k80
#SBATCH --output=k80_%j.out
#SBATCH --error=k80_%j.err
#SBATCH --ntasks=1
#SBATCH --gres gpu:2
#SBATCH --cpus-per-task=8
#SBATCH --constraint=k80
#SBATCH --time=00:15:00

module purge; module load K80 cuda/8.0 mkl/2017.1 CUDNN/5.1.10-cuda_8.0 intel-opencl/2016 python/3.6.0+_ML
python --num_gpu=1

Hint: Add this to run a job using the reservation queue


Hint: Add this to run a job using the debug queue

#SBATCH --partition=debug
#SBATCH --qos=debug

numsection Lab report

If you don’t have time to finish all tasks during this lab session, please, follow the indications of your teacher about how to create your lab report and how to submit it.