numsection Python

Python is a widely used programming language (source code is now available under the GNU General Public License GPL) started by Guido van Rossum that supports multiple programming paradigms. Although it is an interpreted language rather than compiled language and therefore might take up more CPU time (important detail in our Computer Architecture department), Python has a gentle learning curve. Python is readable, writeable, and endlessly powerful. Its simplicity lets you become productive quickly. Python is the programming language of choice for our labs. Only Python basics are required in order to follow these labs. If you have no prior knowledge of Python, to help you learn the required background knowledge by yourself, you can follow this Python Quick Start.

numsection Docker

Docker is the worlds leading software container platform. Developers use Docker to eliminate works on my local machine problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux and Windows Server apps. A container image is a lightweight, standalone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, sys- tem libraries, settings. Available for both Linux and Windows based apps, containerised software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure. In this course, we will use Docker in order to isolate all the frameworks and programs and avoid configuration problems.

numsection MNIST Dataset

The MNIST data-set is composed by a set of black and white images containing hand-written digits, containing more than 60.000 examples for training a model, and 10.000 for testing it. The MNIST data-set can be found at the MNIST database. This data-set is ideal for most of the people who begin with pattern recognition on real examples without having to spend time on data pre-processing or formatting, two very important steps when dealing with images but expensive in time.

The black and white images (bilevel) have been normalized into 20×20 pixel images, preserving the aspect ratio. For this case, we notice that the images contain gray pixels as a result of the anti-aliasing used in the normalization algorithm (reducing the resolution of all the images to one of the lowest levels). After that, the images are centered in 28×28 pixel frames by computing the mass center and moving it into the center of the frame. The images are like the ones shown here:

The images are represented as a numerical matrix. For example, one of the images of number 1 can be represented as:

Where each position indicates the level of lackness of each pixel between 0 and 1. This matrix can also be transformed in a bunch of points in a vectorial space of 784 dimensions ( 28×28 = 784 numbers).

numsection Lab Tasks

task: Install Docker for your platform
task: Download and run the Docker image. Open a terminal (Mac/Linux), Open cmd or powershell (Windows 10 Pro) or Open the Docker CLI (Other windows versions) and then run:
#Download the docker image
docker pull jorditorresbcn/dl

MacOS and Windows users should have the docker program open in order to run docker commands. This docker image is based on Ubuntu 16.04 with the following software stack: Python3.5, Keras, TensorFlow, nano, htop, iPython, Jupyter, matplotlib and git.

task: Run the docker image for first time. Open a terminal on Linux/Mac, PowerShell on Windows 10 or the Docker CLI on other Windows versions and then run:
#Create a container
docker run -it -p 8888:8888  jorditorresbcn/dl:latest

If you close the container and you need to re-open it, run:

docker ps -a
docker start -i YOUR_CONTAINER_ID
task: Run the Jupyer Notebook server:
#Inside the container
jupyter notebook --ip=0.0.0.0 --allow-root

On your computer, open your browser and go to http://localhost:8888, the password is aidl.

If you are on windows and you are experiencing connectivity issues, please check THIS.

task: Download and print MNIST dataset. On your browser, create a new notebook. Download the dataset using the following python code:
from keras.datasets import mnist
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Print the shape of the variables:

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

Plot one of the data numbers using the following code:

%matplotlib inline
from matplotlib import pyplot as plt
plt.imshow(X_train[0])

numsection Lab report

If you don’t have time to finish all tasks during this lab session, please, follow the indications of your teacher about how to create your lab report and how to submit it.