Deep Learning – Summer 2024/25

The objective of this course is to provide a comprehensive introduction to deep neural networks, which have consistently demonstrated superior performance across diverse domains, notably in processing and generating images, text, and speech.

The course focuses both on theory spanning from the basics to the latest advances, as well as on practical implementations in Python and PyTorch (students implement and train deep neural networks performing image classification, image segmentation, object detection, part of speech tagging, lemmatization, speech recognition, reading comprehension, and image generation). Basic Python skills are required, but no previous knowledge of artificial neural networks is needed; basic machine learning understanding is advantageous.

Students work either individually or in small teams on weekly assignments, including competition tasks, where the goal is to obtain the highest performance in the class.

Optionally, you can obtain a micro-credential after passing the course.

About

SIS code: NPFL138
Semester: summer
E-credits: 8
Examination: 3/4 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

lectures: Czech lecture is held on Tuesday 15:40 in S5, English lecture on Tuesday 10:40 in S3; first lecture is on Feb 18
practicals: there are two parallel practicals, a Czech one on Thursday 10:40 in S9, and an English one on Thursday 9:00 in S9; first practicals are on Feb 20
consultations: entirely optional consultations take place on Wednesday 12:20 in S9; first consultations are on Feb 26

All lectures and practicals will be recorded and available on this website.

Lectures

1. Introduction to Deep Learning Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

2. Training Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

3. Training Neural Networks II Slides PDF Slides CZ Lecture CZ Losses&Metrics CZ Practicals EN Lecture EN Losses&Metrics EN Practicals Questions mnist_regularization mnist_ensemble uppercase

4. Convolutional Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_cnn torch_dataset mnist_multiple cifar_competition

5. Convolutional Neural Networks II Slides PDF Slides CZ Lecture CZ Transposed Convolution CZ Practicals EN Lecture EN Transposed Convolution EN Practicals Questions cnn_manual cags_classification cags_segmentation

6. Object Detection Slides PDF Slides CZ Lecture EN Lecture EN cnn_manual EN Practicals Questions bboxes_utils svhn_competition

7. Recurrent Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sequence_classification tagger_we tagger_cle tagger_competition

8. Structured Prediction, CTC, Word2Vec Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tensorboard_projector tagger_ner ctc_manual speech_recognition

9. Seq2seq, NMT, Transformer Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions lemmatizer_noattn lemmatizer_attn lemmatizer_competition

10. Transformer, BERT, ViT Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tagger_transformer sentiment_analysis reading_comprehension

11. Deep Reinforcement Learning, VAE Slides PDF Slides CZ Lecture EN Lecture Questions

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

A micro-credential (aka micro-certificate) is a digital certificate attesting that you have gained knowledge and skills in a specific area. It should be internationally recognized and verifiable using an online EU-wide verification system.

A micro-credential can be obtained both by the university students and external participants.

External Participants

If you are not a university student, you can apply to the Deep Learning micro-credential course and then attend the course along the university students. Upon successfully passing the course, you will obtain the micro-credential.

The price of the course is 5 000 Kč. The lectures take place for 14 weeks from Feb 18 to May 23, the examination period runs until the end of September.

If you have applied to the course, you only need to attend (or watch) the first lecture and the first practicals; the organization of the whole course and the setup instructions are described there.

University Students

If you have passed the course as a part of your study plan (in academic year 2024/25 or later), you can obtain the micro-credential by paying only an administrative fee. The exact price is not known yet, but our estimate is around 220 Kč. More information will be sent to the course participants during the examination period.

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Deep Learning

Feb 18 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
Gaussian distribution [Section 3.9.3 of DLB]
Machine Learning Basics [Section 5.1-5.1.3 of DLB]
History of Deep Learning [Section 1.2 of DLB]
Linear regression [Section 5.1.4 of DLB]
Challenges Motivating Deep Learning [Section 5.11 of DLB]
Neural network basics
- Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
- Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
- Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
- Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
Universal approximation theorem

2. Training Neural Networks

Feb 25 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
Hyperparameters and validation sets [Section 5.3 of DLB]
Maximum Likelihood Estimation [Section 5.5 of DLB]
Neural network training
- Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
- Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.1 and 6.2; note that Algorithms 6.5 and 6.6 are used in practice]
- SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
- SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
- SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
- Optimization algorithms with adaptive gradients
  - AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
  - RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
  - Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

3. Training Neural Networks II

Mar 4 Slides PDF Slides CZ Lecture CZ Losses&Metrics CZ Practicals EN Lecture EN Losses&Metrics EN Practicals Questions mnist_regularization mnist_ensemble uppercase

Regularization [Chapter 7 until Section 7.1 of DLB]
- Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
- L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
- Dataset augmentation [Section 7.4 of DLB]
- Ensembling [Section 7.11 of DLB]
- Dropout [Section 7.12 of DLB]
- Label smoothing [Section 7.5.1 of DLB]
Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
Parameter initialization strategies [Section 8.4 of DLB]
Gradient clipping [Section 10.11.1 of DLB]
Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30)]

4. Convolutional Neural Networks

Mar 11 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_cnn torch_dataset mnist_multiple cifar_competition

Introduction to convolutional networks [Chapter 9 and Sections 9.1-9.3 of DLB]
Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
Max pooling and average pooling [Section 9.3 of DLB]
Stride and Padding schemes [Section 9.5 of DLB]
AlexNet [ImageNet Classification with Deep Convolutional Neural Networks]
VGG [Very Deep Convolutional Networks for Large-Scale Image Recognition]
GoogLeNet (aka Inception) [Going Deeper with Convolutions]
Batch normalization [Section 8.7.1 of DLB, optionally the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]

Multi-armed bandits [Sections 2-2.4 of RLB]
Markov Decision Process [Sections 3-3.3 of RLB]
Policies and Value Functions [Sections 3.5 of RLB]
Policy Gradient Methods [Sections 13-13.1 of RLB]
Policy Gradient Theorem [Section 13.2 of RLB]
REINFORCE algorithm [Section 13.3 of RLB]
REINFORCE with baseline algorithm [Section 13.4 of RLB]
Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 14-14.2.3 of DLB]
Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB, Auto-Encoding Variational Bayes]

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.11, PyTorch 2.6.0, Python Image Models 1.0.14, HF Transformers 4.48.1, and Gymnasium 1.0.0. You should install the exact version of these packages yourselves.

Teamwork

Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are allowed to share code and submit identical solutions. Note that all students involved in cheating will be punished, so if you share your source code with a friend, both you and your friend will be punished. That also means that you should never publish your solutions.

numpy_entropy

Deadline: Mar 05, 22:00 2 points

The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file specified in args.data_path, whose lines consist of data points of our dataset, and load a file specified in args.model_path, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability).

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

entropy H(data distribution)
cross-entropy H(data distribution, model distribution)
KL-divergence D_KL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt

Entropy: 0.96 nats
Crossentropy: 0.99 nats
KL divergence: 0.03 nats

python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt

Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats

The last three tests use data available only in ReCodEx. They are analogous to the numpy_entropy_data_3.txt numpy_entropy_model_3.txt and numpy_entropy_data_4.txt numpy_entropy_model_4.txt, but are generated with different random seeds.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt

Entropy: 0.96 nats
Crossentropy: 0.99 nats
KL divergence: 0.03 nats

python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt

Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats

python3 numpy_entropy.py --data_path numpy_entropy_data_3.txt --model_path numpy_entropy_model_3.txt

Entropy: 4.15 nats
Crossentropy: 4.23 nats
KL divergence: 0.08 nats

python3 numpy_entropy.py --data_path numpy_entropy_data_4.txt --model_path numpy_entropy_model_4.txt

Entropy: 4.99 nats
Crossentropy: 5.03 nats
KL divergence: 0.04 nats

pca_first

Deadline: Mar 05, 22:00 2 points

The goal of this exercise is to familiarize with PyTorch torch.Tensors, shapes and basic tensor manipulation methods. Start with the pca_first.py template.

In this assignment, you should compute the covariance matrix of several examples from the MNIST dataset, then compute the first principal component, and quantify the explained variance of it. It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.

Finally, you might want to read the Introduction to PyTorch Tensors.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 pca_first.py --examples=1024 --iterations=64

Total variance: 53.12
Explained variance: 9.64%

python3 pca_first.py --examples=8192 --iterations=128

Total variance: 53.05
Explained variance: 9.89%

python3 pca_first.py --examples=55000 --iterations=1024

Total variance: 52.74
Explained variance: 9.71%

mnist_layers_activations

Deadline: Mar 05, 22:00 2 points

Before solving the assignment, start by playing with example_pytorch_tensorboard.py, in order to familiarize with PyTorch and TensorBoard. After running the example, start TensorBoard in the same directory using tensorboard --logdir logs and open http://localhost:6006 in a browser and explore the generated logs.

Your goal is to modify the mnist_layers_activations.py template such that a user-specified neural network is constructed:

A number of hidden layers (including zero) can be specified on the command line using the parameter hidden_layers.
Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=0 --activation=none

Epoch 1/1 1.0s train_loss=0.5374 train_accuracy=0.8614 dev_loss=0.2768 dev_accuracy=0.9270

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=none

Epoch 1/1 1.4s train_loss=0.3791 train_accuracy=0.8922 dev_loss=0.2400 dev_accuracy=0.9290

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=relu

Epoch 1/1 1.5s train_loss=0.3178 train_accuracy=0.9104 dev_loss=0.1482 dev_accuracy=0.9566

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=tanh

Epoch 1/1 1.4s train_loss=0.3318 train_accuracy=0.9061 dev_loss=0.1632 dev_accuracy=0.9530

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=sigmoid

Epoch 1/1 1.4s train_loss=0.4985 train_accuracy=0.8788 dev_loss=0.2156 dev_accuracy=0.9382

python3 mnist_layers_activations.py --epochs=1 --hidden_layers=3 --activation=relu

Epoch 1/1 1.7s train_loss=0.2700 train_accuracy=0.9213 dev_loss=0.1188 dev_accuracy=0.9680

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_layers_activations.py --hidden_layers=0 --activation=none

Epoch  1/10 1.0s train_loss=0.5374 train_accuracy=0.8614 dev_loss=0.2768 dev_accuracy=0.9270
Epoch  5/10 1.0s train_loss=0.2779 train_accuracy=0.9220 dev_loss=0.2201 dev_accuracy=0.9430
Epoch 10/10 1.0s train_loss=0.2591 train_accuracy=0.9278 dev_loss=0.2139 dev_accuracy=0.9432

python3 mnist_layers_activations.py --hidden_layers=1 --activation=none

Epoch  1/10 1.4s train_loss=0.3791 train_accuracy=0.8922 dev_loss=0.2400 dev_accuracy=0.9290
Epoch  5/10 1.4s train_loss=0.2775 train_accuracy=0.9225 dev_loss=0.2217 dev_accuracy=0.9396
Epoch 10/10 1.4s train_loss=0.2645 train_accuracy=0.9247 dev_loss=0.2264 dev_accuracy=0.9378

python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu

Epoch  1/10 1.4s train_loss=0.3178 train_accuracy=0.9104 dev_loss=0.1482 dev_accuracy=0.9566
Epoch  5/10 1.5s train_loss=0.0627 train_accuracy=0.9811 dev_loss=0.0827 dev_accuracy=0.9786
Epoch 10/10 1.6s train_loss=0.0240 train_accuracy=0.9930 dev_loss=0.0782 dev_accuracy=0.9810

python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh

Epoch  1/10 1.4s train_loss=0.3318 train_accuracy=0.9061 dev_loss=0.1632 dev_accuracy=0.9530
Epoch  5/10 1.4s train_loss=0.0732 train_accuracy=0.9798 dev_loss=0.0837 dev_accuracy=0.9768
Epoch 10/10 1.5s train_loss=0.0254 train_accuracy=0.9943 dev_loss=0.0733 dev_accuracy=0.9790

python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid

Epoch  1/10 1.4s train_loss=0.4985 train_accuracy=0.8788 dev_loss=0.2156 dev_accuracy=0.9382
Epoch  5/10 1.4s train_loss=0.1249 train_accuracy=0.9641 dev_loss=0.1077 dev_accuracy=0.9698
Epoch 10/10 1.4s train_loss=0.0605 train_accuracy=0.9837 dev_loss=0.0781 dev_accuracy=0.9762

python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu

Epoch  1/10 1.7s train_loss=0.2700 train_accuracy=0.9213 dev_loss=0.1188 dev_accuracy=0.9680
Epoch  5/10 1.9s train_loss=0.0477 train_accuracy=0.9849 dev_loss=0.0787 dev_accuracy=0.9794
Epoch 10/10 1.9s train_loss=0.0248 train_accuracy=0.9916 dev_loss=0.1015 dev_accuracy=0.9762

python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu

Epoch  1/10 2.8s train_loss=0.3562 train_accuracy=0.8911 dev_loss=0.1556 dev_accuracy=0.9598
Epoch  5/10 3.3s train_loss=0.0864 train_accuracy=0.9764 dev_loss=0.1164 dev_accuracy=0.9686
Epoch 10/10 3.3s train_loss=0.0474 train_accuracy=0.9874 dev_loss=0.0877 dev_accuracy=0.9774

python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid

Epoch  1/10 2.6s train_loss=1.9711 train_accuracy=0.1803 dev_loss=1.8477 dev_accuracy=0.2148
Epoch  5/10 2.6s train_loss=0.9947 train_accuracy=0.5815 dev_loss=0.8246 dev_accuracy=0.6392
Epoch 10/10 2.6s train_loss=0.4406 train_accuracy=0.8924 dev_loss=0.4239 dev_accuracy=0.8992

sgd_backpropagation

Deadline: Mar 12, 22:00 3 points

The template was updated on Feb 27, 17:30. The original one did not shuffle the training data. You do not need to redownload it, ReCodEx accepts both variants. However, the Tests and Examples have been regenerated using the updated template.

In this exercise you will learn how to compute gradients using the so-called automatic differentiation, which allows to automatically run backpropagation algorithm for a given computation. You can read the Automatic Differentiation with torch.autograd tutorial if interested. After computing the gradient, you should then perform training by running manually implemented minibatch stochastic gradient descent.

Starting with the sgd_backpropagation.py template, you should:

implement a neural network with a single tanh hidden layer and categorical output layer;
compute the crossentropy loss;
use .backward() to automatically compute the gradient of the loss with respect to all variables;
perform the SGD update.

This assignment also demonstrates the most important parts of the npfl138.TrainableModule that we are using.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sgd_backpropagation.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1

Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Test accuracy after epoch 2 is 92.72

python3 sgd_backpropagation.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2

Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Test accuracy after epoch 2 is 93.75

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sgd_backpropagation.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1

Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Dev accuracy after epoch 3 is 94.68
Dev accuracy after epoch 4 is 95.08
Dev accuracy after epoch 5 is 95.28
Dev accuracy after epoch 6 is 95.20
Dev accuracy after epoch 7 is 95.52
Dev accuracy after epoch 8 is 95.32
Dev accuracy after epoch 9 is 95.66
Dev accuracy after epoch 10 is 95.84
Test accuracy after epoch 10 is 95.02

python3 sgd_backpropagation.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2

Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Dev accuracy after epoch 3 is 95.66
Dev accuracy after epoch 4 is 95.90
Dev accuracy after epoch 5 is 96.26
Dev accuracy after epoch 6 is 96.52
Dev accuracy after epoch 7 is 96.52
Dev accuracy after epoch 8 is 96.74
Dev accuracy after epoch 9 is 96.74
Dev accuracy after epoch 10 is 96.62
Test accuracy after epoch 10 is 95.84

sgd_manual

Deadline: ~~Mar 12~~ Mar 19, 22:00 2 points

The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.

While in this assignment we compute the gradient manually, we will nearly always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise than a real-world application. Furthermore, we will compute the derivatives together on the Mar 06 practicals.

Start with the sgd_manual.py template, which is based on the sgd_backpropagation.py one.

Note that ReCodEx disables the PyTorch automatic differentiation during evaluation.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sgd_manual.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1

Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Test accuracy after epoch 2 is 92.72

python3 sgd_manual.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2

Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Test accuracy after epoch 2 is 93.75

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sgd_manual.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1

Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Dev accuracy after epoch 3 is 94.68
Dev accuracy after epoch 4 is 95.08
Dev accuracy after epoch 5 is 95.28
Dev accuracy after epoch 6 is 95.20
Dev accuracy after epoch 7 is 95.52
Dev accuracy after epoch 8 is 95.32
Dev accuracy after epoch 9 is 95.66
Dev accuracy after epoch 10 is 95.84
Test accuracy after epoch 10 is 95.02

python3 sgd_manual.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2

Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Dev accuracy after epoch 3 is 95.66
Dev accuracy after epoch 4 is 95.90
Dev accuracy after epoch 5 is 96.26
Dev accuracy after epoch 6 is 96.52
Dev accuracy after epoch 7 is 96.52
Dev accuracy after epoch 8 is 96.74
Dev accuracy after epoch 9 is 96.74
Dev accuracy after epoch 10 is 96.62
Test accuracy after epoch 10 is 95.84

mnist_training

Deadline: Mar 12, 22:00 2 points

This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:

Using specified optimizer (either SGD or Adam).
Optionally using momentum for the SGD optimizer.
Using specified learning rate for the optimizer.
Optionally use a given learning rate schedule. The schedule can be either linear, exponential, or cosine. If a schedule is specified, you also get a final learning rate, and the learning rate should be gradually decreased during training to reach the final learning rate just after the training (i.e., the first update after the training would use exactly the final learning rate).

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.01

Epoch 1/1 1.2s train_loss=0.8300 train_accuracy=0.7960 dev_loss=0.3780 dev_accuracy=0.9060

python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.01 --momentum=0.9

Epoch 1/1 1.2s train_loss=0.3731 train_accuracy=0.8952 dev_loss=0.1912 dev_accuracy=0.9472

python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.1

Epoch 1/1 1.1s train_loss=0.3660 train_accuracy=0.8970 dev_loss=0.1945 dev_accuracy=0.9460

python3 mnist_training.py --epochs=1 --optimizer=Adam --learning_rate=0.001

Epoch 1/1 1.5s train_loss=0.3025 train_accuracy=0.9152 dev_loss=0.1487 dev_accuracy=0.9582

python3 mnist_training.py --epochs=1 --optimizer=Adam --learning_rate=0.01

Epoch 1/1 1.6s train_loss=0.2333 train_accuracy=0.9297 dev_loss=0.1618 dev_accuracy=0.9508

python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001

Epoch 1/2 1.6s train_loss=0.2162 train_lr=0.0050 train_accuracy=0.9341 dev_loss=0.1150 dev_accuracy=0.9658
Epoch 2/2 1.9s train_loss=0.0790 train_lr=0.0001 train_accuracy=0.9759 dev_loss=0.0739 dev_accuracy=0.9778
Next learning rate to be used: 0.0001

python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001

Epoch 1/2 1.6s train_loss=0.2022 train_lr=0.0032 train_accuracy=0.9383 dev_loss=0.0989 dev_accuracy=0.9746
Epoch 2/2 1.8s train_loss=0.0748 train_lr=0.0010 train_accuracy=0.9769 dev_loss=0.0777 dev_accuracy=0.9790
Next learning rate to be used: 0.001

python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001

Epoch 1/2 1.7s train_loss=0.2192 train_lr=0.0050 train_accuracy=0.9333 dev_loss=0.1155 dev_accuracy=0.9680
Epoch 2/2 1.9s train_loss=0.0720 train_lr=0.0001 train_accuracy=0.9776 dev_loss=0.0765 dev_accuracy=0.9790
Next learning rate to be used: 0.0001

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_training.py --optimizer=SGD --learning_rate=0.01

Epoch  1/10 1.1s train_loss=0.8300 train_accuracy=0.7960 dev_loss=0.3780 dev_accuracy=0.9060
Epoch  2/10 1.1s train_loss=0.4088 train_accuracy=0.8892 dev_loss=0.2940 dev_accuracy=0.9208
Epoch  3/10 1.1s train_loss=0.3473 train_accuracy=0.9030 dev_loss=0.2585 dev_accuracy=0.9286
Epoch  4/10 1.1s train_loss=0.3144 train_accuracy=0.9116 dev_loss=0.2383 dev_accuracy=0.9352
Epoch  5/10 1.1s train_loss=0.2911 train_accuracy=0.9184 dev_loss=0.2230 dev_accuracy=0.9404
Epoch  6/10 1.1s train_loss=0.2729 train_accuracy=0.9235 dev_loss=0.2093 dev_accuracy=0.9432
Epoch  7/10 1.2s train_loss=0.2577 train_accuracy=0.9281 dev_loss=0.1993 dev_accuracy=0.9480
Epoch  8/10 1.1s train_loss=0.2442 train_accuracy=0.9316 dev_loss=0.1903 dev_accuracy=0.9510
Epoch  9/10 1.1s train_loss=0.2326 train_accuracy=0.9350 dev_loss=0.1828 dev_accuracy=0.9546
Epoch 10/10 1.1s train_loss=0.2222 train_accuracy=0.9379 dev_loss=0.1744 dev_accuracy=0.9546

python3 mnist_training.py --optimizer=SGD --learning_rate=0.01 --momentum=0.9

Epoch  1/10 1.2s train_loss=0.3731 train_accuracy=0.8952 dev_loss=0.1912 dev_accuracy=0.9472
Epoch  2/10 1.3s train_loss=0.1942 train_accuracy=0.9437 dev_loss=0.1322 dev_accuracy=0.9662
Epoch  3/10 1.4s train_loss=0.1432 train_accuracy=0.9588 dev_loss=0.1137 dev_accuracy=0.9688
Epoch  4/10 1.4s train_loss=0.1148 train_accuracy=0.9674 dev_loss=0.0954 dev_accuracy=0.9744
Epoch  5/10 1.4s train_loss=0.0962 train_accuracy=0.9728 dev_loss=0.0914 dev_accuracy=0.9740
Epoch  6/10 1.4s train_loss=0.0824 train_accuracy=0.9767 dev_loss=0.0823 dev_accuracy=0.9772
Epoch  7/10 1.4s train_loss=0.0718 train_accuracy=0.9801 dev_loss=0.0806 dev_accuracy=0.9780
Epoch  8/10 1.4s train_loss=0.0640 train_accuracy=0.9817 dev_loss=0.0741 dev_accuracy=0.9800
Epoch  9/10 1.4s train_loss=0.0565 train_accuracy=0.9841 dev_loss=0.0775 dev_accuracy=0.9800
Epoch 10/10 1.4s train_loss=0.0509 train_accuracy=0.9861 dev_loss=0.0737 dev_accuracy=0.9788

python3 mnist_training.py --optimizer=SGD --learning_rate=0.1

Epoch  1/10 1.2s train_loss=0.3660 train_accuracy=0.8970 dev_loss=0.1945 dev_accuracy=0.9460
Epoch  2/10 1.1s train_loss=0.1940 train_accuracy=0.9438 dev_loss=0.1320 dev_accuracy=0.9652
Epoch  3/10 1.1s train_loss=0.1433 train_accuracy=0.9588 dev_loss=0.1101 dev_accuracy=0.9696
Epoch  4/10 1.2s train_loss=0.1146 train_accuracy=0.9673 dev_loss=0.0941 dev_accuracy=0.9748
Epoch  5/10 1.2s train_loss=0.0949 train_accuracy=0.9735 dev_loss=0.0915 dev_accuracy=0.9754
Epoch  6/10 1.1s train_loss=0.0816 train_accuracy=0.9766 dev_loss=0.0804 dev_accuracy=0.9782
Epoch  7/10 1.1s train_loss=0.0714 train_accuracy=0.9800 dev_loss=0.0783 dev_accuracy=0.9792
Epoch  8/10 1.1s train_loss=0.0627 train_accuracy=0.9819 dev_loss=0.0734 dev_accuracy=0.9804
Epoch  9/10 1.1s train_loss=0.0558 train_accuracy=0.9843 dev_loss=0.0759 dev_accuracy=0.9814
Epoch 10/10 1.2s train_loss=0.0502 train_accuracy=0.9860 dev_loss=0.0728 dev_accuracy=0.9806

python3 mnist_training.py --optimizer=Adam --learning_rate=0.001

Epoch  1/10 1.5s train_loss=0.3025 train_accuracy=0.9152 dev_loss=0.1487 dev_accuracy=0.9582
Epoch  2/10 1.6s train_loss=0.1349 train_accuracy=0.9601 dev_loss=0.1003 dev_accuracy=0.9724
Epoch  3/10 1.6s train_loss=0.0909 train_accuracy=0.9724 dev_loss=0.0893 dev_accuracy=0.9756
Epoch  4/10 1.6s train_loss=0.0686 train_accuracy=0.9797 dev_loss=0.0879 dev_accuracy=0.9742
Epoch  5/10 1.6s train_loss=0.0542 train_accuracy=0.9838 dev_loss=0.0755 dev_accuracy=0.9782
Epoch  6/10 1.6s train_loss=0.0434 train_accuracy=0.9873 dev_loss=0.0781 dev_accuracy=0.9786
Epoch  7/10 1.6s train_loss=0.0344 train_accuracy=0.9900 dev_loss=0.0735 dev_accuracy=0.9796
Epoch  8/10 1.7s train_loss=0.0280 train_accuracy=0.9913 dev_loss=0.0746 dev_accuracy=0.9800
Epoch  9/10 1.6s train_loss=0.0225 train_accuracy=0.9934 dev_loss=0.0768 dev_accuracy=0.9814
Epoch 10/10 1.6s train_loss=0.0189 train_accuracy=0.9947 dev_loss=0.0838 dev_accuracy=0.9780

python3 mnist_training.py --optimizer=Adam --learning_rate=0.01

Epoch  1/10 1.6s train_loss=0.2333 train_accuracy=0.9297 dev_loss=0.1618 dev_accuracy=0.9508
Epoch  2/10 1.9s train_loss=0.1456 train_accuracy=0.9569 dev_loss=0.1718 dev_accuracy=0.9600
Epoch  3/10 1.9s train_loss=0.1257 train_accuracy=0.9637 dev_loss=0.1653 dev_accuracy=0.9626
Epoch  4/10 1.9s train_loss=0.1128 train_accuracy=0.9679 dev_loss=0.1789 dev_accuracy=0.9604
Epoch  5/10 1.9s train_loss=0.1013 train_accuracy=0.9718 dev_loss=0.1316 dev_accuracy=0.9684
Epoch  6/10 2.0s train_loss=0.0992 train_accuracy=0.9729 dev_loss=0.1425 dev_accuracy=0.9642
Epoch  7/10 2.0s train_loss=0.0963 train_accuracy=0.9750 dev_loss=0.1814 dev_accuracy=0.9702
Epoch  8/10 2.0s train_loss=0.0969 train_accuracy=0.9759 dev_loss=0.1727 dev_accuracy=0.9712
Epoch  9/10 2.0s train_loss=0.0833 train_accuracy=0.9786 dev_loss=0.1854 dev_accuracy=0.9666
Epoch 10/10 2.0s train_loss=0.0808 train_accuracy=0.9796 dev_loss=0.1904 dev_accuracy=0.9710

python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001

Epoch  1/10 1.6s train_loss=0.2329 train_lr=0.0090 train_accuracy=0.9295 dev_loss=0.1592 dev_accuracy=0.9542
Epoch  2/10 1.9s train_loss=0.1313 train_lr=0.0080 train_accuracy=0.9611 dev_loss=0.1211 dev_accuracy=0.9674
Epoch  3/10 1.9s train_loss=0.0983 train_lr=0.0070 train_accuracy=0.9696 dev_loss=0.1034 dev_accuracy=0.9734
Epoch  4/10 1.9s train_loss=0.0713 train_lr=0.0060 train_accuracy=0.9784 dev_loss=0.1250 dev_accuracy=0.9690
Epoch  5/10 1.9s train_loss=0.0557 train_lr=0.0051 train_accuracy=0.9825 dev_loss=0.1086 dev_accuracy=0.9748
Epoch  6/10 1.9s train_loss=0.0414 train_lr=0.0041 train_accuracy=0.9867 dev_loss=0.0983 dev_accuracy=0.9776
Epoch  7/10 1.9s train_loss=0.0246 train_lr=0.0031 train_accuracy=0.9921 dev_loss=0.1009 dev_accuracy=0.9782
Epoch  8/10 1.9s train_loss=0.0144 train_lr=0.0021 train_accuracy=0.9955 dev_loss=0.0996 dev_accuracy=0.9798
Epoch  9/10 2.0s train_loss=0.0072 train_lr=0.0011 train_accuracy=0.9979 dev_loss=0.0999 dev_accuracy=0.9800
Epoch 10/10 1.9s train_loss=0.0039 train_lr=0.0001 train_accuracy=0.9993 dev_loss=0.0985 dev_accuracy=0.9812
Next learning rate to be used: 0.0001

python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001

Epoch  1/10 1.6s train_loss=0.2235 train_lr=0.0079 train_accuracy=0.9331 dev_loss=0.1471 dev_accuracy=0.9584
Epoch  2/10 1.9s train_loss=0.1151 train_lr=0.0063 train_accuracy=0.9654 dev_loss=0.1097 dev_accuracy=0.9706
Epoch  3/10 1.9s train_loss=0.0782 train_lr=0.0050 train_accuracy=0.9757 dev_loss=0.1059 dev_accuracy=0.9748
Epoch  4/10 1.9s train_loss=0.0521 train_lr=0.0040 train_accuracy=0.9839 dev_loss=0.0984 dev_accuracy=0.9720
Epoch  5/10 1.9s train_loss=0.0366 train_lr=0.0032 train_accuracy=0.9879 dev_loss=0.1046 dev_accuracy=0.9764
Epoch  6/10 1.9s train_loss=0.0235 train_lr=0.0025 train_accuracy=0.9921 dev_loss=0.0965 dev_accuracy=0.9798
Epoch  7/10 1.9s train_loss=0.0144 train_lr=0.0020 train_accuracy=0.9954 dev_loss=0.0914 dev_accuracy=0.9810
Epoch  8/10 1.9s train_loss=0.0101 train_lr=0.0016 train_accuracy=0.9970 dev_loss=0.0924 dev_accuracy=0.9808
Epoch  9/10 1.9s train_loss=0.0057 train_lr=0.0013 train_accuracy=0.9986 dev_loss=0.1007 dev_accuracy=0.9820
Epoch 10/10 1.9s train_loss=0.0038 train_lr=0.0010 train_accuracy=0.9992 dev_loss=0.0926 dev_accuracy=0.9832
Next learning rate to be used: 0.001

python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001

Epoch  1/10 1.6s train_loss=0.2362 train_lr=0.0098 train_accuracy=0.9288 dev_loss=0.1563 dev_accuracy=0.9556
Epoch  2/10 1.9s train_loss=0.1340 train_lr=0.0091 train_accuracy=0.9605 dev_loss=0.1450 dev_accuracy=0.9652
Epoch  3/10 1.9s train_loss=0.1088 train_lr=0.0080 train_accuracy=0.9688 dev_loss=0.1465 dev_accuracy=0.9612
Epoch  4/10 2.0s train_loss=0.0774 train_lr=0.0066 train_accuracy=0.9767 dev_loss=0.1184 dev_accuracy=0.9706
Epoch  5/10 1.9s train_loss=0.0569 train_lr=0.0050 train_accuracy=0.9823 dev_loss=0.1140 dev_accuracy=0.9762
Epoch  6/10 2.0s train_loss=0.0381 train_lr=0.0035 train_accuracy=0.9876 dev_loss=0.1166 dev_accuracy=0.9770
Epoch  7/10 1.9s train_loss=0.0195 train_lr=0.0021 train_accuracy=0.9939 dev_loss=0.1022 dev_accuracy=0.9800
Epoch  8/10 1.9s train_loss=0.0097 train_lr=0.0010 train_accuracy=0.9972 dev_loss=0.1059 dev_accuracy=0.9808
Epoch  9/10 1.9s train_loss=0.0055 train_lr=0.0003 train_accuracy=0.9989 dev_loss=0.1073 dev_accuracy=0.9792
Epoch 10/10 1.9s train_loss=0.0040 train_lr=0.0001 train_accuracy=0.9993 dev_loss=0.1071 dev_accuracy=0.9792
Next learning rate to be used: 0.0001

gym_cartpole

Deadline: Mar 12, 22:00 3 points

Solve the CartPole-v1 environment from the Gymnasium library, utilizing only provided supervised training dataset of 100 examples. Start with the gym_cartpole.py template.

The solution to this task should be a model which passes evaluation on random inputs. This evaluation can be performed by running the gym_cartpole.py with --evaluate argument (optionally rendering if --render option is provided), or directly calling the evaluate_model method. In order to pass, you must achieve an average reward of at least 475 on 100 episodes. Your model should have two outputs (i.e., corresponding to a categorical distribution with 2 output classes).

When designing the model, you should consider that the size of the training data is very small and the data is quite noisy.

When submitting to ReCodEx, do not forget to also submit the trained model.

mnist_regularization

Deadline: Mar 19, 22:00 3 points

You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:

Allow using dropout with rate args.dropout. Add a dropout layer after the first Flatten and also after all Linear hidden layers (but not after the output layer).
Allow using AdamW with weight decay with strength of args.weight_decay, making sure the weight decay is not applied on bias.
Allow using label smoothing with weight args.label_smoothing.

In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard, notably the training, development and test set accuracy and loss:

dropout rate 0, 0.3, 0.5, 0.6, 0.8;
weight decay 0, 0.1, 0.3, 0.5, 1.0;
label smoothing 0, 0.1, 0.3, 0.5.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_regularization.py --epochs=1 --dropout=0.3

Epoch 1/1 0.4s train_loss=0.7775 train_accuracy=0.7704 dev_loss=0.3211 dev_accuracy=0.9122

python3 mnist_regularization.py --epochs=1 --dropout=0.5 --hidden_layers 300 300

Epoch 1/1 0.4s train_loss=1.5365 train_accuracy=0.4824 dev_loss=0.5010 dev_accuracy=0.8680

python3 mnist_regularization.py --epochs=1 --weight_decay=0.1

Epoch 1/1 0.4s train_loss=0.5948 train_accuracy=0.8386 dev_loss=0.2868 dev_accuracy=0.9206

python3 mnist_regularization.py --epochs=1 --weight_decay=0.3

Epoch 1/1 0.4s train_loss=0.5969 train_accuracy=0.8386 dev_loss=0.2890 dev_accuracy=0.9206

python3 mnist_regularization.py --epochs=1 --label_smoothing=0.1

Epoch 1/1 0.4s train_loss=0.9841 train_accuracy=0.8442 dev_loss=0.7734 dev_accuracy=0.9244

python3 mnist_regularization.py --epochs=1 --label_smoothing=0.3

Epoch 1/1 0.4s train_loss=1.5040 train_accuracy=0.8458 dev_loss=1.3727 dev_accuracy=0.9312

mnist_ensemble

Deadline: Mar 19, 22:00 2 points

Your goal in this assignment is to implement ensembling of classification models by averaging their predicted probability distributions. The mnist_ensemble.py template trains args.models individual models, and your goal is to perform an ensemble of the first model, first two models, first three models, …, all models, and evaluate their accuracy on the development set.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_ensemble.py --epochs=1 --models=5

Model 1, individual accuracy 96.08, ensemble accuracy 96.08
Model 2, individual accuracy 96.18, ensemble accuracy 96.48
Model 3, individual accuracy 96.02, ensemble accuracy 96.58
Model 4, individual accuracy 95.94, ensemble accuracy 96.64
Model 5, individual accuracy 96.14, ensemble accuracy 96.66

python3 mnist_ensemble.py --epochs=1 --models=5 --hidden_layer_size=200

Model 1, individual accuracy 96.58, ensemble accuracy 96.58
Model 2, individual accuracy 96.70, ensemble accuracy 96.80
Model 3, individual accuracy 96.70, ensemble accuracy 97.04
Model 4, individual accuracy 96.96, ensemble accuracy 97.14
Model 5, individual accuracy 96.76, ensemble accuracy 97.12

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_ensemble.py --models=5

Model 1, individual accuracy 97.76, ensemble accuracy 97.76
Model 2, individual accuracy 97.90, ensemble accuracy 98.08
Model 3, individual accuracy 97.92, ensemble accuracy 98.30
Model 4, individual accuracy 98.02, ensemble accuracy 98.36
Model 5, individual accuracy 97.86, ensemble accuracy 98.38

python3 mnist_ensemble.py --models=5 --hidden_layer_size=200

Model 1, individual accuracy 98.10, ensemble accuracy 98.10
Model 2, individual accuracy 98.20, ensemble accuracy 98.42
Model 3, individual accuracy 97.90, ensemble accuracy 98.44
Model 4, individual accuracy 97.96, ensemble accuracy 98.46
Model 5, individual accuracy 97.90, ensemble accuracy 98.58

uppercase

Deadline: Mar 19, 22:00 4 points+5 bonus

This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use the npfl138.datasets.uppercase_data module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.

This is an open-data task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py/ipynb file.

The task is also a competition. Everyone who submits a solution achieving at least 98.5% accuracy gets 4 basic points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. The accuracy is computed per-character and can be evaluated programatically using the UppercaseData.evaluate_file method, or by running python3 -m npfl138.datasets.uppercase_data command with --evaluate argument.

Start with the uppercase.py template, which uses the npfl138.datasets.uppercase_data to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.

Do not use RNNs, CNNs, or Transformer in this task (if you have doubts, contact me); fully connected layers (and therefore also embedding layers), any activations, residual connections, and any regularization layers are fine.

mnist_cnn

Deadline: Mar 26, 22:00 3 points

To pass this assignment, you will learn to construct basic convolutional neural network layers. Start with the mnist_cnn.py template and assume the requested architecture is described by the cnn argument, which contains comma-separated specifications of the following layers:

C-filters-kernel_size-stride-padding: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example: C-10-3-1-same
CB-filters-kernel_size-stride-padding: Same as C-filters-kernel_size-stride-padding, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally the ReLU activation. Example: CB-10-3-1-same
M-pool_size-stride: Add max pooling with specified size and stride, using the default padding of 0 (the "valid" padding). Example: M-3-2
R-[layers]: Add a residual connection. The layers contain a specification of at least one convolutional layer (but not a recursive residual connection R). The input to the R layer should be processed sequentially by layers, and the produced output (after the ReLU nonlinearity of the last layer) should be added to the input (of this R layer). Example: R-[C-16-3-1-same,C-16-3-1-same]
F: Flatten inputs. Must appear exactly once in the architecture.
H-hidden_layer_size: Add a dense layer with ReLU activation and specified size. Example: H-100
D-dropout_rate: Apply dropout with the given dropout rate. Example: D-0.5

An example architecture might be --cnn=CB-16-5-2-valid,M-3-2,F,H-100,D-0.5. You can assume the resulting network is valid; it is fine to crash if it is not.

After a successful ReCodEx submission, you can try obtaining the best accuracy on MNIST and then advance to cifar_competition.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_cnn.py --epochs=1 --cnn=F,H-100

Epoch 1/1 1.6s train_loss=0.3178 train_accuracy=0.9104 dev_loss=0.1482 dev_accuracy=0.9566

python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5

Epoch 1/1 1.5s train_loss=0.4831 train_accuracy=0.8574 dev_loss=0.1584 dev_accuracy=0.9556

python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50

Epoch 1/1 1.4s train_loss=0.7251 train_accuracy=0.7780 dev_loss=0.4007 dev_accuracy=0.8820

python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-valid,C-8-3-2-valid,F,H-50

Epoch 1/1 1.9s train_loss=0.8031 train_accuracy=0.7437 dev_loss=0.3459 dev_accuracy=0.9000

python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32

Epoch 1/1 1.7s train_loss=0.6422 train_accuracy=0.8009 dev_loss=0.2784 dev_accuracy=0.9184

python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50

Epoch 1/1 2.6s train_loss=0.4411 train_accuracy=0.8620 dev_loss=0.1888 dev_accuracy=0.9436

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_cnn.py --cnn=F,H-100

Epoch  1/10 1.4s train_loss=0.3178 train_accuracy=0.9104 dev_loss=0.1482 dev_accuracy=0.9566
Epoch  2/10 1.6s train_loss=0.1502 train_accuracy=0.9560 dev_loss=0.1049 dev_accuracy=0.9720
Epoch  3/10 1.5s train_loss=0.1048 train_accuracy=0.9692 dev_loss=0.0939 dev_accuracy=0.9720
Epoch  4/10 1.5s train_loss=0.0812 train_accuracy=0.9757 dev_loss=0.0856 dev_accuracy=0.9760
Epoch  5/10 1.6s train_loss=0.0627 train_accuracy=0.9811 dev_loss=0.0827 dev_accuracy=0.9786
Epoch  6/10 1.6s train_loss=0.0516 train_accuracy=0.9846 dev_loss=0.0749 dev_accuracy=0.9794
Epoch  7/10 1.6s train_loss=0.0420 train_accuracy=0.9869 dev_loss=0.0726 dev_accuracy=0.9796
Epoch  8/10 1.6s train_loss=0.0331 train_accuracy=0.9903 dev_loss=0.0733 dev_accuracy=0.9800
Epoch  9/10 1.6s train_loss=0.0275 train_accuracy=0.9918 dev_loss=0.0782 dev_accuracy=0.9782
Epoch 10/10 1.6s train_loss=0.0240 train_accuracy=0.9930 dev_loss=0.0782 dev_accuracy=0.9810

python3 mnist_cnn.py --cnn=F,H-100,D-0.5

Epoch  1/10 1.5s train_loss=0.4831 train_accuracy=0.8574 dev_loss=0.1584 dev_accuracy=0.9556
Epoch  2/10 1.6s train_loss=0.2754 train_accuracy=0.9197 dev_loss=0.1225 dev_accuracy=0.9666
Epoch  3/10 1.7s train_loss=0.2279 train_accuracy=0.9327 dev_loss=0.1021 dev_accuracy=0.9716
Epoch  4/10 1.7s train_loss=0.2027 train_accuracy=0.9401 dev_loss=0.0951 dev_accuracy=0.9714
Epoch  5/10 1.7s train_loss=0.1892 train_accuracy=0.9441 dev_loss=0.0914 dev_accuracy=0.9752
Epoch  6/10 1.7s train_loss=0.1751 train_accuracy=0.9468 dev_loss=0.0821 dev_accuracy=0.9746
Epoch  7/10 1.7s train_loss=0.1659 train_accuracy=0.9495 dev_loss=0.0783 dev_accuracy=0.9760
Epoch  8/10 1.7s train_loss=0.1551 train_accuracy=0.9526 dev_loss=0.0768 dev_accuracy=0.9764
Epoch  9/10 1.7s train_loss=0.1487 train_accuracy=0.9545 dev_loss=0.0828 dev_accuracy=0.9776
Epoch 10/10 1.7s train_loss=0.1446 train_accuracy=0.9548 dev_loss=0.0770 dev_accuracy=0.9776

python3 mnist_cnn.py --cnn=F,H-200,D-0.5

Epoch  1/10 2.0s train_loss=0.3757 train_accuracy=0.8880 dev_loss=0.1266 dev_accuracy=0.9680
Epoch  2/10 2.2s train_loss=0.1991 train_accuracy=0.9414 dev_loss=0.0981 dev_accuracy=0.9734
Epoch  3/10 2.3s train_loss=0.1595 train_accuracy=0.9528 dev_loss=0.0858 dev_accuracy=0.9762
Epoch  4/10 2.4s train_loss=0.1352 train_accuracy=0.9588 dev_loss=0.0786 dev_accuracy=0.9782
Epoch  5/10 2.4s train_loss=0.1217 train_accuracy=0.9637 dev_loss=0.0739 dev_accuracy=0.9806
Epoch  6/10 2.4s train_loss=0.1089 train_accuracy=0.9669 dev_loss=0.0692 dev_accuracy=0.9818
Epoch  7/10 2.4s train_loss=0.1017 train_accuracy=0.9678 dev_loss=0.0699 dev_accuracy=0.9814
Epoch  8/10 2.4s train_loss=0.0958 train_accuracy=0.9703 dev_loss=0.0691 dev_accuracy=0.9812
Epoch  9/10 2.4s train_loss=0.0859 train_accuracy=0.9723 dev_loss=0.0656 dev_accuracy=0.9828
Epoch 10/10 2.4s train_loss=0.0860 train_accuracy=0.9725 dev_loss=0.0674 dev_accuracy=0.9834

python3 mnist_cnn.py --cnn=C-8-3-1-same,C-8-3-1-same,M-3-2,C-16-3-1-same,C-16-3-1-same,M-3-2,F,H-200

Epoch  1/10 13.2s train_loss=0.1713 train_accuracy=0.9475 dev_loss=0.0649 dev_accuracy=0.9824
Epoch  2/10 18.8s train_loss=0.0521 train_accuracy=0.9833 dev_loss=0.0369 dev_accuracy=0.9892
Epoch  3/10 19.0s train_loss=0.0383 train_accuracy=0.9881 dev_loss=0.0327 dev_accuracy=0.9916
Epoch  4/10 16.6s train_loss=0.0290 train_accuracy=0.9906 dev_loss=0.0338 dev_accuracy=0.9900
Epoch  5/10 13.1s train_loss=0.0247 train_accuracy=0.9922 dev_loss=0.0330 dev_accuracy=0.9898
Epoch  6/10 13.1s train_loss=0.0201 train_accuracy=0.9935 dev_loss=0.0369 dev_accuracy=0.9902
Epoch  7/10 16.1s train_loss=0.0176 train_accuracy=0.9945 dev_loss=0.0358 dev_accuracy=0.9910
Epoch  8/10 19.0s train_loss=0.0147 train_accuracy=0.9953 dev_loss=0.0315 dev_accuracy=0.9932
Epoch  9/10 18.4s train_loss=0.0138 train_accuracy=0.9954 dev_loss=0.0278 dev_accuracy=0.9932
Epoch 10/10 13.2s train_loss=0.0104 train_accuracy=0.9966 dev_loss=0.0334 dev_accuracy=0.9922

python3 mnist_cnn.py --cnn=CB-8-3-1-same,CB-8-3-1-same,M-3-2,CB-16-3-1-same,CB-16-3-1-same,M-3-2,F,H-200

Epoch  1/10 14.3s train_loss=0.1443 train_accuracy=0.9569 dev_loss=0.0501 dev_accuracy=0.9854
Epoch  2/10 19.5s train_loss=0.0517 train_accuracy=0.9838 dev_loss=0.0678 dev_accuracy=0.9792
Epoch  3/10 20.7s train_loss=0.0399 train_accuracy=0.9873 dev_loss=0.0369 dev_accuracy=0.9892
Epoch  4/10 20.5s train_loss=0.0337 train_accuracy=0.9893 dev_loss=0.0376 dev_accuracy=0.9900
Epoch  5/10 17.7s train_loss=0.0281 train_accuracy=0.9908 dev_loss=0.0264 dev_accuracy=0.9926
Epoch  6/10 14.9s train_loss=0.0226 train_accuracy=0.9929 dev_loss=0.0384 dev_accuracy=0.9900
Epoch  7/10 14.9s train_loss=0.0203 train_accuracy=0.9935 dev_loss=0.0516 dev_accuracy=0.9864
Epoch  8/10 21.7s train_loss=0.0179 train_accuracy=0.9941 dev_loss=0.0381 dev_accuracy=0.9908
Epoch  9/10 20.4s train_loss=0.0151 train_accuracy=0.9949 dev_loss=0.0348 dev_accuracy=0.9918
Epoch 10/10 19.0s train_loss=0.0157 train_accuracy=0.9946 dev_loss=0.0319 dev_accuracy=0.9918

torch_dataset

Deadline: Mar 26, 22:00 2 points

In this assignment you will familiarize yourselves with torch.utils.data, which is a PyTorch way of constructing training datasets. If you want, you can read the Dataset and DataLoaders tutorial.

The goal of this assignment is to start with the torch_dataset.py template and implement a simple image augmentation preprocessing. The template also shows you how to use the npfl138.TransformedDataset module.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 torch_dataset.py --epochs=1 --batch_size=100

Epoch 1/1 1.0s train_loss=2.1280 train_accuracy=0.1872 dev_loss=1.9093 dev_accuracy=0.2800

python3 torch_dataset.py --epochs=1 --batch_size=50 --augment

Epoch 1/1 1.6s train_loss=2.1056 train_accuracy=0.2052 dev_loss=1.9050 dev_accuracy=0.2970

mnist_multiple

Deadline: Mar 26, 22:00 3 points

In this assignment you will implement a model with multiple inputs and outputs. Start with the mnist_multiple.py template and:

mnist_multiple

The goal is to create a model, which given two input MNIST images, compares if the digit on the first one is greater than on the second one.
We perform this comparison in two different ways:
- first by directly predicting the comparison by the network (direct comparison),
- then by first classifying the images into digits and then comparing these predictions (indirect comparison).
The model has four outputs:
- direct comparison whether the first digit is greater than the second one,
- digit classification for the first image,
- digit classification for the second image,
- indirect comparison comparing the digits predicted by the above two outputs.
You need to implement:
- the model, using multiple inputs, outputs, losses and metrics;
- construction of two-image dataset examples using regular MNIST data via the PyTorch datasets.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 mnist_multiple.py --epochs=1 --batch_size=50

Epoch 1/1 3.8s train_loss=0.8910 train_direct_comparison=0.8716 train_indirect_comparison=0.9439 dev_loss=0.2992 dev_direct_comparison=0.9544 dev_indirect_comparison=0.9832

python3 mnist_multiple.py --epochs=1 --batch_size=100

Epoch 1/1 3.6s train_loss=1.1647 train_direct_comparison=0.8425 train_indirect_comparison=0.9268 dev_loss=0.4307 dev_direct_comparison=0.9284 dev_indirect_comparison=0.9772

cifar_competition

Deadline: Mar 26, 22:00 4 points+5 bonus

The goal of this assignment is to devise the best possible model for CIFAR-10. You can load the data using the npfl138.datasets.cifar10 module. Note that the test set is different than that of official CIFAR-10.

The task is a competition. Everyone who submits a solution achieving at least 70% test set accuracy gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. You can evaluate a generated file with predictions by running the python3 -m npfl138.datasets.cifar10 --evaluate PREDICTIONS_FILE_PATH --dataset dev/test; however, only the --dataset dev produces valid accuracy since you do not have the test set annotations.

Note that my solutions usually need to achieve around ~85% on the development set to score 70% on the test set.

You may want to start with the cifar_competition.py template which generates the test set annotation in the required format.

In this assignment, you must implement and train your model from scratch, so you cannot use the torchvision.models or the timm package.

cnn_manual

Deadline: Apr 09, 22:00 3 points Slides Recording

To pass this assignment, you need to manually implement the forward and backward pass through a 2D convolutional layer. Start with the cnn_manual.py template, which constructs a series of 2D convolutional layers with ReLU activation and valid padding, specified in the args.cnn option. The args.cnn contains comma-separated layer specifications in the format filters-kernel_size-stride.

In this assignment, we use the channels_last (NHWC) format; therefore, images have shape [batch_size, height, width, channels] and the convolutional kernel has shape [kernel_height, kernel_width, in_channels, out_channels]. These shapes are consistent with the course slides, but are different from the native PyTorch format.

Of course, you cannot use any PyTorch convolutional operation (including torch.nn.{Fold/Unfold} and torch.nn.functional.{fold/unfold}) nor the .backward() for gradient computation. Instead, implement convolution and gradient computations using matrix multiplication and other basic operations (element-wise multiplication, summation, etc; if you want, you can read about torch.einsum).

To make debugging easier, the template supports a --verify option, which allows comparing the forward pass and the three gradients you compute in the backward pass to correct values.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 cnn_manual.py --epochs=1 --cnn=5-1-1

Dev accuracy after epoch 1 is 88.16
Test accuracy after epoch 1 is 85.91

python3 cnn_manual.py --epochs=1 --cnn=5-3-1

Dev accuracy after epoch 1 is 88.26
Test accuracy after epoch 1 is 86.12

python3 cnn_manual.py --epochs=1 --cnn=5-3-2

Dev accuracy after epoch 1 is 87.44
Test accuracy after epoch 1 is 84.97

python3 cnn_manual.py --epochs=1 --cnn=5-3-2,10-3-2

Dev accuracy after epoch 1 is 85.14
Test accuracy after epoch 1 is 82.26

python3 cnn_manual.py --epochs=1 --cnn=30-1-1,20-3-2

Dev accuracy after epoch 1 is 89.36
Test accuracy after epoch 1 is 86.77

cags_classification

Deadline: Apr 02, 22:00 4 points+5 bonus

The goal of this assignment is to use a pretrained model, for example the EfficientNetV2-B0, to achieve best accuracy in CAGS classification.

The CAGS dataset consists of images of cats and dogs of size $224×224$ , each classified in one of the 34 breeds and each containing a mask indicating the presence of the animal. To load the dataset, use the npfl138.datasets.cags module.

The template includes the code loading the EfficientNetV2-B0 model using the timm library, which automatically downloads the model weights. However, you can use any model from the timm library in this assignment.

An example performing classification of given images is available in image_classification.py.

A note on finetuning: you should start by training only the newly added classifier. To that end pass only the classifier parameters to the optimizer you want to use. If you want to finetune the whole model later, you should create another optimizer and pass it to the TrainableModule using another configure call.

The task is a competition. Everyone who submits a solution achieving at least 93% test set accuracy gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions.

You may want to start with the cags_classification.py template which generates the test set annotation in the required format.

cags_segmentation

Deadline: Apr 02, 22:00 4 points+5 bonus

The goal of this assignment is to use a pretrained model, for example the EfficientNetV2-B0, to achieve best image segmentation IoU score on the CAGS dataset. The dataset and the EfficientNetV2-B0 is described in the cags_classification assignment. Nevertheless, you can again use any model from timm library in this assignment.

A mask is evaluated using intersection over union (IoU) metric, which is the intersection of the gold and predicted mask divided by their union, and the whole test set score is the average of its masks' IoU. A TrainableModule-compatible metric is implemented by the class MaskIoUMetric of the npfl138.datasets.cags module, which can also evaluate your predictions (either by running with python3 -m npfl138.datasets.cags --evaluate_segmentation=path --dataset=dev/test arguments, or using its evaluate_segmentation_file method) and also visualize your predictions (using python3 -m npfl138.datasets.cags --visualize_segmentation=path --dataset=dev/test).

The task is a competition. Everyone who submits a solution achieving at least 87% test set IoU gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions.

You may want to start with the cags_segmentation.py template, which generates the test set annotation in the required format – each mask should be encoded on a single line as a space separated sequence of integers indicating the length of alternating runs of zeros and ones.

bboxes_utils

Deadline: Apr 09, 22:00 2 points

This is a preparatory assignment for svhn_competition. The goal is to implement several bounding box manipulation routines in the bboxes_utils.py module. Notably, you need to implement the following methods:

bboxes_to_rcnn: convert given bounding boxes to a R-CNN-like representation relative to the given anchors;
bboxes_from_rcnn: convert R-CNN-like representations relative to given anchors back to bounding boxes;
bboxes_training: given a list of anchors and gold objects, assign gold objects to anchors and generate suitable training data (the exact algorithm is described in the template).

The bboxes_utils.py contains simple unit tests, which are evaluated when executing the module, which you can use to check the validity of your implementation. Note that the template does not contain type annotations because Python typing system is not flexible enough to describe the tensor shape changes.

When submitting to ReCodEx, the method main is executed, returning the implemented bboxes_to_rcnn, bboxes_from_rcnn and bboxes_training methods. These methods are then executed and compared to the reference implementation.

svhn_competition

Deadline: Apr 09, 22:00 5 points+5 bonus

The goal of this assignment is to implement a system performing object recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone (or any other model from the timm library).

The Street View House Numbers (SVHN) dataset annotates for every photo all digits appearing on it, including their bounding boxes. The dataset can be loaded using the npfl138.datasets.svhn module. Similarly to the CAGS dataset, the train/dev/test are PyTorch torch.utils.data.Datasets, and every element is a dictionary with the following keys:

"image": a square 3-channel image stored using torch.Tensor of type torch.uint8,
"classes": a 1D torch.Tensor with all digit labels appearing in the image,
"bboxes": a [num_digits, 4] 2D torch.Tensor with bounding boxes of every digit in the image, each represented as [TOP, LEFT, BOTTOM, RIGHT].

Each test set image annotation consists of a sequence of space separated five-tuples label top left bottom right, and the annotation is considered correct, if exactly the gold digits are predicted, each with IoU at least 0.5. The whole test set score is then the prediction accuracy of individual images. You can again evaluate your predictions using the npfl138.datasets.svhn module, either by running with python3 -m npfl138.datasets.svhn --evaluate=path --dataset=dev/test or using the svhn.evaluate method. Futhermore, you can visualize your predictions by using python3 -m npfl138.datasets.svhn --visualize=path --dataset=dev/test.

The task is a competition. Everyone who submits a solution achieving at least 20% test set accuracy gets 5 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. Note that I usually need at least 35% development set accuracy to achieve the required test set performance.

You should start with the svhn_competition.py template, which generates the test set annotation in the required format.

A baseline solution can use RetinaNet-like single stage detector, using only a single level of convolutional features (no FPN) with single-scale and single-aspect anchors. Focal loss is available as torchvision.ops.sigmoid_focal_loss and non-maximum suppression as torchvision.ops.nms or torchvision.ops.batched_nms.

sequence_classification

Deadline: Apr 16, 22:00 2 points

The goal of this assignment is to introduce recurrent neural networks, show their convergence speed, and illustrate exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.

Your goal is to modify the sequence_classification.py template and implement the following:

Use the specified RNN type (RNN, GRU, and LSTM) and dimensionality.
Process the sequence using the required RNN.
Use additional hidden layer on the RNN outputs if requested.
Implement gradient clipping if requested.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:

--rnn=RNN --sequence_dim=1, --rnn=GRU --sequence_dim=1, --rnn=LSTM --sequence_dim=1
the same as above but with --sequence_dim=3
the same as above but with --sequence_dim=10
--rnn=RNN --hidden_layer=75 --rnn_dim=30 --sequence_dim=30 and the same with --clip_gradient=1
the same as above but with --rnn=GRU with and without --clip_gradient=1
the same as above but with --rnn=LSTM with and without --clip_gradient=1

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=RNN --epochs=5

Epoch 1/5 0.1s train_loss=0.7008 train_accuracy=0.4762 dev_loss=0.6952 dev_accuracy=0.4778
Epoch 2/5 0.1s train_loss=0.6938 train_accuracy=0.5011 dev_loss=0.6926 dev_accuracy=0.4979
Epoch 3/5 0.1s train_loss=0.6922 train_accuracy=0.5215 dev_loss=0.6918 dev_accuracy=0.5395
Epoch 4/5 0.1s train_loss=0.6913 train_accuracy=0.5423 dev_loss=0.6912 dev_accuracy=0.5362
Epoch 5/5 0.1s train_loss=0.6909 train_accuracy=0.5508 dev_loss=0.6909 dev_accuracy=0.5405

python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=GRU --epochs=5

Epoch 1/5 0.2s train_loss=0.6960 train_accuracy=0.4760 dev_loss=0.6943 dev_accuracy=0.4747
Epoch 2/5 0.2s train_loss=0.6936 train_accuracy=0.4967 dev_loss=0.6931 dev_accuracy=0.5026
Epoch 3/5 0.2s train_loss=0.6926 train_accuracy=0.5062 dev_loss=0.6924 dev_accuracy=0.5296
Epoch 4/5 0.2s train_loss=0.6920 train_accuracy=0.5307 dev_loss=0.6919 dev_accuracy=0.5267
Epoch 5/5 0.2s train_loss=0.6917 train_accuracy=0.5307 dev_loss=0.6915 dev_accuracy=0.5321

python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5

Epoch 1/5 0.1s train_loss=0.6939 train_accuracy=0.4993 dev_loss=0.6931 dev_accuracy=0.5065
Epoch 2/5 0.1s train_loss=0.6932 train_accuracy=0.5007 dev_loss=0.6931 dev_accuracy=0.5027
Epoch 3/5 0.1s train_loss=0.6929 train_accuracy=0.5115 dev_loss=0.6928 dev_accuracy=0.5483
Epoch 4/5 0.1s train_loss=0.6927 train_accuracy=0.5444 dev_loss=0.6928 dev_accuracy=0.5480
Epoch 5/5 0.1s train_loss=0.6925 train_accuracy=0.5403 dev_loss=0.6931 dev_accuracy=0.5407

python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --rnn_dim=16 --hidden_layer=50 --sequence_dim=3

Epoch 1/5 0.1s train_loss=0.6928 train_accuracy=0.4956 dev_loss=0.6903 dev_accuracy=0.5160
Epoch 2/5 0.1s train_loss=0.6886 train_accuracy=0.5182 dev_loss=0.6892 dev_accuracy=0.5152
Epoch 3/5 0.1s train_loss=0.6835 train_accuracy=0.5138 dev_loss=0.6785 dev_accuracy=0.5124
Epoch 4/5 0.1s train_loss=0.6691 train_accuracy=0.5493 dev_loss=0.6596 dev_accuracy=0.5347
Epoch 5/5 0.1s train_loss=0.6474 train_accuracy=0.5756 dev_loss=0.6342 dev_accuracy=0.5939

python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --rnn_dim=16 --hidden_layer=50 --sequence_dim=3 --clip_gradient=0.01

Epoch 1/5 0.1s train_loss=0.6928 train_accuracy=0.4946 dev_loss=0.6900 dev_accuracy=0.5055
Epoch 2/5 0.1s train_loss=0.6882 train_accuracy=0.5167 dev_loss=0.6871 dev_accuracy=0.5135
Epoch 3/5 0.1s train_loss=0.6814 train_accuracy=0.5083 dev_loss=0.6756 dev_accuracy=0.5107
Epoch 4/5 0.1s train_loss=0.6684 train_accuracy=0.5483 dev_loss=0.6609 dev_accuracy=0.5204
Epoch 5/5 0.1s train_loss=0.6504 train_accuracy=0.5754 dev_loss=0.6404 dev_accuracy=0.5936

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 sequence_classification.py --rnn=RNN --epochs=5

Epoch 1/5 1.3s train_loss=0.6938 train_accuracy=0.5073 dev_loss=0.6919 dev_accuracy=0.5121
Epoch 2/5 1.3s train_loss=0.6905 train_accuracy=0.5124 dev_loss=0.6885 dev_accuracy=0.5098
Epoch 3/5 1.3s train_loss=0.6854 train_accuracy=0.5122 dev_loss=0.6805 dev_accuracy=0.5167
Epoch 4/5 1.3s train_loss=0.6750 train_accuracy=0.5149 dev_loss=0.6706 dev_accuracy=0.5136
Epoch 5/5 1.3s train_loss=0.6688 train_accuracy=0.5148 dev_loss=0.6662 dev_accuracy=0.5129

python3 sequence_classification.py --rnn=GRU --epochs=5

Epoch 1/5 2.9s train_loss=0.6929 train_accuracy=0.5061 dev_loss=0.6920 dev_accuracy=0.5157
Epoch 2/5 2.8s train_loss=0.6872 train_accuracy=0.5183 dev_loss=0.6767 dev_accuracy=0.5343
Epoch 3/5 2.9s train_loss=0.2817 train_accuracy=0.8612 dev_loss=0.0412 dev_accuracy=1.0000
Epoch 4/5 2.9s train_loss=0.0271 train_accuracy=0.9998 dev_loss=0.0180 dev_accuracy=1.0000
Epoch 5/5 2.9s train_loss=0.0144 train_accuracy=0.9998 dev_loss=0.0104 dev_accuracy=1.0000

python3 sequence_classification.py --rnn=LSTM --epochs=5

Epoch 1/5 0.8s train_loss=0.6932 train_accuracy=0.5081 dev_loss=0.6929 dev_accuracy=0.5136
Epoch 2/5 0.8s train_loss=0.6927 train_accuracy=0.5135 dev_loss=0.6929 dev_accuracy=0.5142
Epoch 3/5 0.8s train_loss=0.6920 train_accuracy=0.5124 dev_loss=0.6914 dev_accuracy=0.5146
Epoch 4/5 0.8s train_loss=0.6903 train_accuracy=0.5108 dev_loss=0.6883 dev_accuracy=0.5084
Epoch 5/5 0.8s train_loss=0.6816 train_accuracy=0.5217 dev_loss=0.6724 dev_accuracy=0.5383

python3 sequence_classification.py --rnn=LSTM --epochs=5 --rnn_dim=30 --hidden_layer=75 --sequence_dim=30

Epoch 1/5 1.3s train_loss=0.6931 train_accuracy=0.5053 dev_loss=0.6928 dev_accuracy=0.5089
Epoch 2/5 1.3s train_loss=0.6830 train_accuracy=0.5165 dev_loss=0.6540 dev_accuracy=0.5399
Epoch 3/5 1.3s train_loss=0.6261 train_accuracy=0.5555 dev_loss=0.5964 dev_accuracy=0.5797
Epoch 4/5 1.3s train_loss=0.5761 train_accuracy=0.5900 dev_loss=0.5547 dev_accuracy=0.6158
Epoch 5/5 1.3s train_loss=0.5423 train_accuracy=0.6549 dev_loss=0.6601 dev_accuracy=0.5286

python3 sequence_classification.py --rnn=LSTM --epochs=5 --rnn_dim=30 --hidden_layer=75 --sequence_dim=30 --clip_gradient=1

Epoch 1/5 1.4s train_loss=0.6931 train_accuracy=0.5053 dev_loss=0.6928 dev_accuracy=0.5089
Epoch 2/5 1.4s train_loss=0.6830 train_accuracy=0.5165 dev_loss=0.6540 dev_accuracy=0.5399
Epoch 3/5 1.4s train_loss=0.6252 train_accuracy=0.5562 dev_loss=0.5943 dev_accuracy=0.5789
Epoch 4/5 1.4s train_loss=0.5732 train_accuracy=0.5932 dev_loss=0.5512 dev_accuracy=0.6212
Epoch 5/5 1.4s train_loss=0.2098 train_accuracy=0.8723 dev_loss=0.0015 dev_accuracy=1.0000

tagger_we

Deadline: Apr 16, 22:00 3 points

In this assignment you will create a simple part-of-speech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The morpho_dataset module (down)loads the dataset and uses Vocabulary to provide mappings between strings and integers.

Your goal is to modify the tagger_we.py template and implement the following:

Use specified RNN layer type (GRU and LSTM) and dimensionality.
Create word embeddings for training vocabulary.
Process the sentences using bidirectional RNN.
Predict part-of-speech tags. Note that you need to properly handle sentences of different lengths in one batch.

In the alternative tagger_we.packed.py template, forward processes a PackedSequence instead of a rectangular tensor and produces also a PackedSequence; both templates deliver the same results, and are both accepted by ReCodEx.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16

Epoch 1/1 1.6s train_loss=2.3559 train_accuracy=0.3358 dev_loss=2.0420 dev_accuracy=0.4121

python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16

Epoch 1/1 1.6s train_loss=2.1929 train_accuracy=0.3318 dev_loss=1.5136 dev_accuracy=0.5596

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=64

Epoch 1/5 11.8s train_loss=1.0064 train_accuracy=0.6990 dev_loss=0.4203 dev_accuracy=0.8638
Epoch 2/5 11.2s train_loss=0.1147 train_accuracy=0.9717 dev_loss=0.3357 dev_accuracy=0.8799
Epoch 3/5 11.6s train_loss=0.0319 train_accuracy=0.9912 dev_loss=0.3699 dev_accuracy=0.8677
Epoch 4/5 11.4s train_loss=0.0193 train_accuracy=0.9950 dev_loss=0.3772 dev_accuracy=0.8730
Epoch 5/5 11.5s train_loss=0.0122 train_accuracy=0.9969 dev_loss=0.4070 dev_accuracy=0.8704

python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=64

Epoch 1/5 12.0s train_loss=0.7531 train_accuracy=0.7726 dev_loss=0.3586 dev_accuracy=0.8909
Epoch 2/5 11.2s train_loss=0.0751 train_accuracy=0.9801 dev_loss=0.3172 dev_accuracy=0.8855
Epoch 3/5 11.1s train_loss=0.0232 train_accuracy=0.9927 dev_loss=0.3037 dev_accuracy=0.8971
Epoch 4/5 11.0s train_loss=0.0144 train_accuracy=0.9955 dev_loss=0.3446 dev_accuracy=0.8841
Epoch 5/5 11.3s train_loss=0.0088 train_accuracy=0.9974 dev_loss=0.3267 dev_accuracy=0.8940

tagger_cle

Deadline: Apr 16, 22:00 3 points

This assignment is a continuation of tagger_we. Using the tagger_cle.py template, implement character-level word embedding computation using a bidirectional character-level GRU.

Once submitted to ReCodEx, you should experiment with the effect of CLEs compared to a plain tagger_we, and the influence of their dimensionality. Note that tagger_cle has by default smaller word embeddings so that the size of word representation (64 + 32 + 32) is the same as in the tagger_we assignment.

Again, in the alternative tagger_cle.packed.py template, forward processes PackedSequences instead of rectangular tensors and produces also a PackedSequence; both templates deliver the same results when word masking is not used, and are both accepted by ReCodEx.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=24

Epoch 1/1 2.0s train_loss=2.2294 train_accuracy=0.3722 dev_loss=1.8014 dev_accuracy=0.4973

python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16 --cle_dim=24 --word_masking=0.1

Epoch 1/1 1.9s train_loss=2.0588 train_accuracy=0.4126 dev_loss=1.4207 dev_accuracy=0.5601

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32

Epoch 1/5 11.2s train_loss=1.0658 train_accuracy=0.6894 dev_loss=0.3472 dev_accuracy=0.9141
Epoch 2/5 10.9s train_loss=0.1342 train_accuracy=0.9714 dev_loss=0.1787 dev_accuracy=0.9469
Epoch 3/5 10.9s train_loss=0.0477 train_accuracy=0.9889 dev_loss=0.1627 dev_accuracy=0.9475
Epoch 4/5 11.0s train_loss=0.0296 train_accuracy=0.9923 dev_loss=0.1712 dev_accuracy=0.9393
Epoch 5/5 10.9s train_loss=0.0197 train_accuracy=0.9952 dev_loss=0.1713 dev_accuracy=0.9474

python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=32 --cle_dim=32 --word_masking=0.1

Epoch 1/5 11.2s train_loss=0.8103 train_accuracy=0.7639 dev_loss=0.2349 dev_accuracy=0.9326
Epoch 2/5 10.8s train_loss=0.1409 train_accuracy=0.9599 dev_loss=0.1599 dev_accuracy=0.9493
Epoch 3/5 11.0s train_loss=0.0807 train_accuracy=0.9750 dev_loss=0.1511 dev_accuracy=0.9529
Epoch 4/5 11.1s train_loss=0.0613 train_accuracy=0.9800 dev_loss=0.1363 dev_accuracy=0.9562
Epoch 5/5 11.1s train_loss=0.0534 train_accuracy=0.9825 dev_loss=0.1499 dev_accuracy=0.9533

tagger_competition

Deadline: Apr 16, 22:00 4 points+5 bonus

In this assignment, you should extend tagger_cle into a real-world Czech part-of-speech tagger. We will use Czech PDT dataset loadable using the morpho_dataset module. Note that the dataset contains more than 1500 unique POS tags and that the POS tags have a fixed structure of 15 positions (so it is possible to generate the POS tag characters independently).

You can use the following additional data in this assignment:

You can use outputs of a morphological analyzer loadable with morpho_analyzer. If a word form in train, dev, or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
You can use any unannotated text data (Wikipedia, Czech National Corpus, …), and also any pre-trained word embeddings (assuming they were trained on plain texts).

The task is a competition. Everyone who submits a solution with at least 93.0% label accuracy gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. Lastly, 1 additional bonus point will be given to anyone surpassing the pre-neural-network state-of-the-art of 96.35%.

You can start with the tagger_competition.py template, which among others generates test set annotations in the required format. Note that you can evaluate the predictions as usual using the morpho_dataset module, either by running python3 -m npfl138.datasets.morpho_dataset --evaluate=path --dataset=dev/test or by calling the MorphoDataset.evaluate method.

tensorboard_projector

You can try exploring the TensorBoard Projector with pre-trained embeddings for 20k most frequent lemmas in Czech and English – after extracting the archive, start tensorboard --logdir dir_where_the_archive_is_extracted.

In order to use the Projector tab yourself, you can take inspiration from the projector_export.py script, which was used to export the above pre-trained embeddings from the Word2vec format.

tagger_ner

Deadline: ~~Apr 30~~ May 07, 22:00 2 points

This assignment is an extension of tagger_we task. Using the tagger_ner.py template, implement optimal decoding of named entity spans from BIO-encoded tags. In a valid sequence, the tags are O, B-TYPE, I-TYPE, and the I-TYPE tag must follow either B-TYPE or I-TYPE tags.

The evaluation is performed using the provided metric computing F1 score of the span prediction (i.e., a recognized possibly-multiword named entity is a true positive if both the entity type and the span exactly match).

In practice, character-level embeddings (and also pre-trained word embeddings) would be used to obtain superior results.

To make debugging easier, the first test below includes a link to tag sequences predicted on the development set using the optimal decoding; you can print the tag sequences your solution predicts using the --show_predictions option.

Your implementation of constrained_decoding must be fast enough because during ReCodEx evaluation it is called 30 times on every batch.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_ner.py --epochs=2 --max_sentences=10 --seed=45

Epoch 1/2 0.0s train_loss=2.1919 train_accuracy=0.1314 dev_loss=2.1578 dev_accuracy=0.0566 dev_f1_constrained=0.0552 dev_f1_greedy=0.0420
Epoch 2/2 0.0s train_loss=2.1201 train_accuracy=0.9086 dev_loss=2.0992 dev_accuracy=0.4292 dev_f1_constrained=0.0435 dev_f1_greedy=0.0215

The optimally decoded tag sequences on the development set

python3 tagger_ner.py --epochs=2 --max_sentences=2000 --batch_size=25 --label_smoothing=0.1 --seed=45

Epoch 1/2 4.4s train_loss=1.5484 train_accuracy=0.7966 dev_loss=1.2449 dev_accuracy=0.8227 dev_f1_constrained=0.0000 dev_f1_greedy=0.0000
Epoch 2/2 4.6s train_loss=1.1883 train_accuracy=0.8105 dev_loss=1.1551 dev_accuracy=0.8238 dev_f1_constrained=0.0211 dev_f1_greedy=0.0182

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_ner.py --epochs=5 --seed=45

Epoch 1/5 32.5s train_loss=0.7204 train_accuracy=0.8358 dev_loss=0.5612 dev_accuracy=0.8450 dev_f1_constrained=0.2035 dev_f1_greedy=0.1596
Epoch 2/5 39.4s train_loss=0.3630 train_accuracy=0.8907 dev_loss=0.4482 dev_accuracy=0.8784 dev_f1_constrained=0.4592 dev_f1_greedy=0.4040
Epoch 3/5 38.5s train_loss=0.1834 train_accuracy=0.9469 dev_loss=0.4306 dev_accuracy=0.8913 dev_f1_constrained=0.4959 dev_f1_greedy=0.4563
Epoch 4/5 46.4s train_loss=0.0904 train_accuracy=0.9743 dev_loss=0.4398 dev_accuracy=0.8877 dev_f1_constrained=0.4983 dev_f1_greedy=0.4499
Epoch 5/5 44.1s train_loss=0.0505 train_accuracy=0.9857 dev_loss=0.4580 dev_accuracy=0.8917 dev_f1_constrained=0.5049 dev_f1_greedy=0.4601

python3 tagger_ner.py --epochs=5 --batch_size=25 --label_smoothing=0.1 --seed=45

Epoch 1/5 24.4s train_loss=1.2252 train_accuracy=0.8285 dev_loss=1.0832 dev_accuracy=0.8320 dev_f1_constrained=0.1185 dev_f1_greedy=0.0970
Epoch 2/5 20.2s train_loss=0.9486 train_accuracy=0.8621 dev_loss=0.9791 dev_accuracy=0.8667 dev_f1_constrained=0.3262 dev_f1_greedy=0.3106
Epoch 3/5 20.0s train_loss=0.8015 train_accuracy=0.9226 dev_loss=0.9255 dev_accuracy=0.8851 dev_f1_constrained=0.4554 dev_f1_greedy=0.4225
Epoch 4/5 19.8s train_loss=0.7081 train_accuracy=0.9596 dev_loss=0.8915 dev_accuracy=0.8947 dev_f1_constrained=0.5209 dev_f1_greedy=0.4865
Epoch 5/5 21.9s train_loss=0.6566 train_accuracy=0.9776 dev_loss=0.8882 dev_accuracy=0.8962 dev_f1_constrained=0.5254 dev_f1_greedy=0.4984

ctc_manual

Deadline: ~~Apr 30~~ May 07, 22:00 3 points

This assignment is an extension of tagger_we task. Using the ctc_manual.py template, manually implement the CTC loss computation and also greedy CTC decoding. You can use torch.nn.CTCLoss during development as a reference, but it is not available during ReCodEx evaluation.

To make debugging easier, the first test below includes a link to file containing $α_-$ , $α_*$ , final $α$ , and losses for all compute_loss calls.

Your implementation of compute_loss must be fast enough because during ReCodEx evaluation it is called 30 times on every batch.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 ctc_manual.py --epochs=1 --max_sentences=30

Epoch 1/1 0.2s train_loss=26.8515 train_edit_distance=1.6522 dev_loss=16.7245 dev_edit_distance=0.6000

Here you can find for every example in every batch its:

matrices $α_-$ and $α_*$ , each row on a single line;
scalar $α^N(M)$ , the log likelihood of all extended labelings corresponding to the gold regular label;
final example loss normalized by the target sequence length.

python3 ctc_manual.py --epochs=1 --max_sentences=1000 --batch_size=100

Epoch 1/1 2.8s train_loss=26.5719 train_edit_distance=1.2628 dev_loss=17.6995 dev_edit_distance=0.5864

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 ctc_manual.py --epochs=5

Epoch 1/5 38.7s train_loss=2.4757 train_edit_distance=0.5985 dev_loss=1.6736 dev_edit_distance=0.5688
Epoch 2/5 41.3s train_loss=1.3058 train_edit_distance=0.4890 dev_loss=1.3966 dev_edit_distance=0.4434
Epoch 3/5 48.9s train_loss=0.7655 train_edit_distance=0.3126 dev_loss=1.3873 dev_edit_distance=0.4193
Epoch 4/5 45.6s train_loss=0.4370 train_edit_distance=0.1745 dev_loss=1.6149 dev_edit_distance=0.4158
Epoch 5/5 48.5s train_loss=0.2641 train_edit_distance=0.1024 dev_loss=1.8303 dev_edit_distance=0.4081

speech_recognition

Deadline: Apr 23, 22:00 4 points+5 bonus

This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using Czech recordings from the Common Voice, with input sound waves passed through the usual preprocessing – computing Mel-frequency cepstral coefficients (MFCCs). You can repeat this preprocessing on a given audio using the load_audio and mfcc_extract methods from the common_voice_cs module. This module can also load the dataset, downloading it when necessary (note that it has 200MB, so it might take a while). Furthermore, you can listen to the development portion of the dataset. Lastly, the whole dataset is available for download in MP3 format (but you are not expected to download that, only if you would like to perform some custom preprocessing).

Additional following data can be utilized in this assignment:

You can use any unannotated text data (Wikipedia, Czech National Corpus, …), and also any pre-trained word embeddings or language models (assuming they were trained on plain texts).
You can use any unannotated speech data.

The task is a competition. The evaluation is performed by computing the edit distance to the gold letter sequence, normalized by its length (a corresponding metric EditDistanceMetric is provided by the common_voice_cs. Everyone who submits a solution with at most 45% test set edit distance gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. Note that you can evaluate the predictions as usual using the common_voice_cs module, either by running python3 -m npfl138.datasets.common_voice_cs --evaluate=path --dataset=dev/test or by calling the CommonVoiceCs.evaluate method.

Start with the speech_recognition.py template containing a structure suitable for computing the CTC loss and performing CTC decoding. You can use torch.nn.CTCLoss to compute the loss and you can use torchaudio.models.decoder.CTCDecoder/torchaudio.models.decoder.CUCTCDecoder to perform beam-search decoding.

lemmatizer_noattn

Deadline: Apr 30, 22:00 3 points

The goal of this assignment is to create a simple lemmatizer. For training and evaluation, we use the same dataset as in tagger_we loadable again by the morpho_dataset module.

Your goal is to modify the lemmatizer_noattn.py template and implement the following:

Embed characters of source forms and run a bidirectional GRU encoder.
Embed characters of target lemmas.
Implement a training time decoder which uses gold target characters as inputs.
Implement an inference time decoder which uses previous predictions as inputs.
The initial state of both decoders is the output state of the corresponding GRU encoded form.
If requested, tie the embeddings in the decoder.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 lemmatizer_noattn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=64 --rnn_dim=32

Epoch 1/1 2.2s train_loss=2.9629 train_accuracy=0.0228 dev_accuracy=0.1324

python3 lemmatizer_noattn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --tie_embeddings

Epoch 1/1 2.0s train_loss=2.8765 train_accuracy=0.0370 dev_accuracy=0.1570

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 lemmatizer_noattn.py --epochs=3 --max_sentences=5000

Epoch 1/3 10.6s train_loss=2.2199 train_accuracy=0.1772 dev_accuracy=0.3221
Epoch 2/3 11.6s train_loss=0.9341 train_accuracy=0.4397 dev_accuracy=0.4890
Epoch 3/3 12.3s train_loss=0.5396 train_accuracy=0.5995 dev_accuracy=0.6037

python3 lemmatizer_noattn.py --epochs=3 --max_sentences=5000 --tie_embeddings

Epoch 1/3 14.3s train_loss=1.8783 train_accuracy=0.2614 dev_accuracy=0.3906
Epoch 2/3 14.4s train_loss=0.7635 train_accuracy=0.5107 dev_accuracy=0.5269
Epoch 3/3 22.8s train_loss=0.4795 train_accuracy=0.6406 dev_accuracy=0.6186

lemmatizer_attn

Deadline: Apr 30, 22:00 3 points

This task is a continuation of the lemmatizer_noattn assignment. Using the lemmatizer_attn.py template, implement the following features in addition to lemmatizer_noattn:

The bidirectional GRU encoder returns outputs for all input characters, not just the last.
Implement attention in the decoders. Notably, project the encoder outputs and current state into same-dimensionality vectors, apply non-linearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.

Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 lemmatizer_attn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=64 --rnn_dim=32

Epoch 1/1 3.4s train_loss=2.9698 train_accuracy=0.0481 dev_accuracy=0.1655

python3 lemmatizer_attn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --tie_embeddings

Epoch 1/1 3.2s train_loss=2.8633 train_accuracy=0.0313 dev_accuracy=0.1530

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 lemmatizer_attn.py --epochs=3 --max_sentences=5000

Epoch 1/3 27.1s train_loss=1.8938 train_accuracy=0.2621 dev_accuracy=0.5941
Epoch 2/3 36.8s train_loss=0.4003 train_accuracy=0.6906 dev_accuracy=0.7295
Epoch 3/3 29.5s train_loss=0.2510 train_accuracy=0.7734 dev_accuracy=0.7821

python3 lemmatizer_attn.py --epochs=3 --max_sentences=5000 --tie_embeddings

Epoch 1/3 21.7s train_loss=1.5409 train_accuracy=0.3568 dev_accuracy=0.6244
Epoch 2/3 31.2s train_loss=0.3149 train_accuracy=0.7330 dev_accuracy=0.7685
Epoch 3/3 25.3s train_loss=0.1996 train_accuracy=0.8066 dev_accuracy=0.7966

lemmatizer_competition

Deadline: Apr 30, 22:00 4 points+5 bonus

In this assignment, you should extend lemmatizer_noattn or lemmatizer_attn into a real-world Czech lemmatizer. As in tagger_competition, we will use Czech PDT dataset loadable using the morpho_dataset module.

You can also use the same additional data as in the tagger_competition assignment.

The task is a competition. Everyone who submits a solution with at least 97.0% exact match accuracy gets 4 points; the remaining 5 bonus points are distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing pre-neural-network state-of-the-art of 98.76%.

You can start with the lemmatizer_competition.py template, which among others generates test set annotations in the required format. Note that you can evaluate the predictions as usual using the morpho_dataset module, either by running python3 -m npfl138.datasets.morpho_dataset --task=lemmatizer --evaluate=path --dataset=dev/test or by calling the MorphoDataset.evaluate method.

tagger_transformer

Deadline: May 07, 22:00 3 points

This assignment is a continuation of tagger_we. Using the tagger_transformer.py template, implement a Pre-LN Transformer encoder.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_layers=0

Epoch 1/1 0.2s train_loss=2.4731 train_accuracy=0.2306 dev_loss=2.0946 dev_accuracy=0.3755

python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=1

Epoch 1/1 0.8s train_loss=2.1833 train_accuracy=0.3348 dev_loss=1.9706 dev_accuracy=0.3419

python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=4

Epoch 1/1 0.9s train_loss=2.1739 train_accuracy=0.3406 dev_loss=1.9658 dev_accuracy=0.3455

python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=4 --transformer_dropout=0.1

Epoch 1/1 0.9s train_loss=2.2749 train_accuracy=0.3127 dev_loss=1.9806 dev_accuracy=0.3611

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

python3 tagger_transformer.py --max_sentences=5000 --transformer_layers=0

Epoch 1/5 3.5s train_loss=1.5338 train_accuracy=0.5334 dev_loss=0.8756 dev_accuracy=0.7211
Epoch 2/5 3.1s train_loss=0.5380 train_accuracy=0.8625 dev_loss=0.5533 dev_accuracy=0.8253
Epoch 3/5 3.0s train_loss=0.2495 train_accuracy=0.9569 dev_loss=0.4516 dev_accuracy=0.8419
Epoch 4/5 3.1s train_loss=0.1323 train_accuracy=0.9784 dev_loss=0.4188 dev_accuracy=0.8474
Epoch 5/5 3.6s train_loss=0.0789 train_accuracy=0.9849 dev_loss=0.4027 dev_accuracy=0.8480

python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=1

Epoch 1/5 8.2s train_loss=1.1116 train_accuracy=0.6405 dev_loss=0.6202 dev_accuracy=0.7765
Epoch 2/5 7.6s train_loss=0.2422 train_accuracy=0.9208 dev_loss=0.4869 dev_accuracy=0.8166
Epoch 3/5 7.6s train_loss=0.0767 train_accuracy=0.9761 dev_loss=0.4880 dev_accuracy=0.8270
Epoch 4/5 7.7s train_loss=0.0441 train_accuracy=0.9855 dev_loss=0.5365 dev_accuracy=0.8418
Epoch 5/5 7.7s train_loss=0.0353 train_accuracy=0.9876 dev_loss=0.5419 dev_accuracy=0.8410

python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4

Epoch 1/5 8.1s train_loss=1.0783 train_accuracy=0.6540 dev_loss=0.6045 dev_accuracy=0.7882
Epoch 2/5 6.2s train_loss=0.1865 train_accuracy=0.9400 dev_loss=0.5526 dev_accuracy=0.8086
Epoch 3/5 6.3s train_loss=0.0632 train_accuracy=0.9795 dev_loss=0.6172 dev_accuracy=0.8175
Epoch 4/5 6.2s train_loss=0.0400 train_accuracy=0.9862 dev_loss=0.8000 dev_accuracy=0.8410
Epoch 5/5 6.4s train_loss=0.0322 train_accuracy=0.9893 dev_loss=0.7473 dev_accuracy=0.8466

python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4 --transformer_dropout=0.1

Epoch 1/5 9.2s train_loss=1.1677 train_accuracy=0.6217 dev_loss=0.6096 dev_accuracy=0.7763
Epoch 2/5 8.7s train_loss=0.2310 train_accuracy=0.9253 dev_loss=0.5208 dev_accuracy=0.8134
Epoch 3/5 8.4s train_loss=0.0784 train_accuracy=0.9762 dev_loss=0.5758 dev_accuracy=0.8335
Epoch 4/5 8.5s train_loss=0.0506 train_accuracy=0.9838 dev_loss=0.5275 dev_accuracy=0.8334
Epoch 5/5 8.5s train_loss=0.0423 train_accuracy=0.9858 dev_loss=0.6932 dev_accuracy=0.8212

sentiment_analysis

Deadline: May 07, 22:00 2 points

Perform sentiment analysis on Czech Facebook data using a provided pre-trained Czech Electra model eleczech-lc-small. The dataset consists of pairs of (document, label) and can be (down)loaded using the text_classification_dataset module.

Even though this assignment is not a competition, your goal is to submit test set annotations with at least 77% accuracy. As usual, you can evaluate your predictions using the text_classification_dataset module, either by running python3 -m npfl138.datasets.text_classification_dataset --evaluate=path --dataset=dev/test or by calling the TextClassificationDataset.evaluate method.

Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.

You can start with the sentiment_analysis.py template, which among others loads the Electra Czech model and generates test set annotations in the required format. Note that example_transformers.py module illustrates the usage of both the Electra tokenizer and the Electra model.

reading_comprehension

Deadline: May 07, 22:00 4 points+5 bonus

Implement the best possible model for reading comprehension task using an automatically translated version of the SQuAD 1.1 dataset, utilizing a provided Czech RoBERTa model ufal/robeczech-base.

The dataset can be loaded using the reading_comprehension_dataset module. The loaded dataset is a direct representation of the data and not yet ready to be directly trained on. Each of the train, dev and test datasets are composed of a list of paragraphs, each consisting of:

context: text with various information;
qas: list of questions and answers, where each item consists of:
- question: text of the question;
- answers: a list of answers, each answer is composed of:
  - text: answer text as string, exactly as appearing in the context;
  - start: character offset of the answer text in the context.

In the train and dev sets, each question has exactly one answer, while in the test set, there might be several answers. We evaluate the reading comprehension task using accuracy, where an answer is considered correct if its text is exactly equal to some correct answer. You can evaluate your predictions as usual with the reading_comprehension_dataset.py module, either by running python3 -m npfl138.datasets.reading_comprehension_dataset --evaluate=path --dataset=dev/test or by calling the ReadingComprehensionDataset.evaluate method.

The task is a competition. Everyone who submits a solution with at least 65% answer accuracy gets 4 points; the remaining 5 points are distributed depending on relative ordering of your solutions. Note that usually achieving 62% on the dev set is enough to get 65% on the test set (because of multiple references in the test set).

Note that contrary to working with EfficientNet, you need to finetune the RobeCzech model in order to achieve the required accuracy.

You can start with the reading_comprehension.py template, which among others (down)loads the data and the RobeCzech model, and describes the format of the required test set annotations.

In the competitions, your goal is to train a model, and then predict target values on the given unannotated test set.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you can include any number of files of any kind, and either submit them individually or compess them in a .zip file. However, there should be exactly one text file with the test set annotation (.txt) and at least one Python source (.py/ipynb) containing the model training and prediction. The Python sources are not executed, but must be included for inspection.

Competition Evaluation

For every submission, ReCodEx checks the above conditions (exactly one .txt, at least one .py/ipynb) and whether the given annotations can be evaluated without error. If not, it will report the corresponding error in the logs.
Before the first deadline, ReCodEx prints the exact achieved performance, but only if it is worse than the baseline.

If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached performance.
After the first deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.
After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.

What Is Allowed

You can use only the given annotated data for training and evaluation.
You can use the given annotated training data in any way.
You can use the given annotated development data for evaluation or hyperparameter tuning, but not for the training itself.
Additionally, you can use any unannotated or manually created data for training and evaluation.
The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like hand-written rules).
Do not use test set annotations in any way, if you somehow get access to them.
Unless stated otherwise, you can use any architecture to solve the competition task at hand, but the implementation must be created by you and you must understand it fully. You can of course take inspiration from any paper or existing implementation, but please reference it in that case.
- You can of course use anything from the PyTorch package (but unless not stated otherwise, do not use models from torchvision, timm, torchaudio, …).
- You can use any data augmentation (even implementations not written by you).
- You can use any optimizer and any hyperparameter optimization method (even implementations not written by you).
If you utilize an already trained model, it must be trained only on the allowed training data, unless stated otherwise.

Install

What Python version to use

The recommended Python version is 3.11. This version is used by ReCodEx to evaluate your solutions. Minimum required version is Python 3.10, and the newest version that currently (as of Feb 20) works is Python 3.12 (because some dependencies do not yet provide precompiled binary packages for Python 3.13).

You can find out the version of your Python installation using python3 --version.
Installing to central user packages repository

You can install all required packages to central user packages repository using python3 -m pip install --user --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu118 npfl138.

On Linux and Windows, the above command installs CUDA 11.8 PyTorch build, but you can change cu118 to:
- cpu to get CPU-only (smaller) version,
- cu124 to get CUDA 12.4 build,
- rocm6.2.4 to get AMD ROCm 6.2.4 build (Linux only).
On macOS, the --extra-index-url has no effect and the Metal support is installed in any case.

To update the npfl138 package later, use python3 -m pip install --user --upgrade npfl138.
Installing to a virtual environment

Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running python3 -m venv VENV_DIR followed by VENV_DIR/bin/pip install --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu118 npfl138. (or VENV_DIR/Scripts/pip on Windows).

Again, apart from the CUDA 11.8 build, you can change cu118 on Linux and Windows to:
- cpu to get CPU-only (smaller) version,
- cu124 to get CUDA 12.4 build,
- rocm6.2.4 to get AMD ROCm 6.2.4 build (Linux only).
To update the npfl138 package later, use VENV_DIR/bin/pip install --upgrade npfl138.
Windows installation
- On Windows, it can happen that python3 is not in PATH, while py command is – in that case you can use py -m venv VENV_DIR, which uses the newest Python available, or for example py -3.11 -m venv VENV_DIR, which uses Python version 3.11.
- If you encounter a problem creating the logs in the args.logdir directory, a possible cause is that the path is longer than 260 characters, which is the default maximum length of a complete path on Windows. However, you can increase this limit on Windows 10, version 1607 or later, by following the instructions.
MacOS installation
- If you encounter issues with SSL certificates (certificate verify failed: self-signed certificate in certificate chain), you probably need to run the Install Certificates.command, which should be executed after installation; see https://docs.python.org/3/using/mac.html#installation-steps.
GPU support on Linux and Windows

PyTorch supports NVIDIA GPU or AMD GPU out of the box, you just need to select appropriate --extra-index-url when installing the packages.

If you encounter problems loading CUDA or cuDNN libraries, make sure your LD_LIBRARY_PATH does not contain paths to older CUDA/cuDNN libraries.

MetaCentrum

How to apply for MetaCentrum account?

After reading the Terms and conditions, you can apply for an account here.

After your account is created, please make sure that the directories containing your solutions are always private.
How to activate Python 3.11 on MetaCentrum?

On Metacentrum, currently the newest available Python is 3.11, which you need to activate in every session by running the following command:
```
module add python/3.11.11-gcc-10.2.1-555dlyc
```
How to install the required virtual environment on MetaCentrum?

To create a virtual environment, you first need to decide where it will reside. Either you can find a permanent storage, where you have large-enough quota, or you can use scratch storage for a submitted job.

TL;DR:
- Run an interactive CPU job, asking for 16GB scratch space:
```
qsub -l select=1:ncpus=1:mem=8gb:scratch_local=16gb -I
```
- In the job, use the allocated scratch space as the temporary directory:
```
export TMPDIR=$SCRATCHDIR
```
- You should clear the scratch space before you exit using the clean_scratch command. You can instruct the shell to call it automatically by running:
```
trap 'clean_scratch' TERM EXIT
```
- Finally, create the virtual environment and install PyTorch in it:
```
module add python/3.11.11-gcc-10.2.1-555dlyc
python3 -m venv CHOSEN_VENV_DIR
CHOSEN_VENV_DIR/bin/pip install --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu118 npfl138
```
How to run a GPU computation on MetaCentrum?

First, read the official MetaCentrum documentation: Basic terms, Run simple job, GPU computing, GPU clusters.

TL;DR: To run an interactive GPU job with 1 CPU, 1 GPU, 8GB RAM, and 32GB scatch space, run:
```
qsub -l select=1:ncpus=1:ngpus=1:mem=8gb:scratch_local=32gb -I
```
To run a script in a non-interactive way, replace the -I option with the script to be executed.

If you want to run a CPU-only computation, remove the ngpus=1: from the above commands.

AIC

How to install required packages on AIC?

The Python 3.11.7 is available /opt/python/3.11.7/bin/python3, so you should start by creating a virtual environment using
```
/opt/python/3.11.7/bin/python3 -m venv VENV_DIR
```
and then install the required packages in it using
```
VENV_DIR/bin/pip install --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu118 npfl138
```
How to run a GPU computation on AIC?

First, read the official AIC documentation: Submitting CPU Jobs, Submitting GPU Jobs.

TL;DR: To run an interactive GPU job with 1 CPU, 1 GPU, and 16GB RAM, run:
```
srun -p gpu -c1 -G1 --mem=16G --pty bash
```
To run a shell script requiring a GPU in a non-interactive way, use
```
sbatch -p gpu -c1 -G1 --mem=16G SCRIPT_PATH
```
If you want to run a CPU-only computation, remove the -p gpu and -G1 from the above commands.

Git

Is it possible to keep the solutions in a Git repository?

Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.
On GitHub, do not create a public fork with your solutions

If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.

Of course, if you just want to create a pull request, GitHub requires a public fork and that is fine – just do not store your solutions in it.
How to clone the course repository?

To clone the course repository, run
```
git clone https://github.com/ufal/npfl138
```
This creates the repository in the npfl138 subdirectory; if you want a different name, add it as a last parameter.

To update the repository, run git pull inside the repository directory.
How to keep the course repository as a branch in your repository?

If you want to store the course repository just in a local branch of your existing repository, you can run the following command while in it:
```
git remote add course_repo https://github.com/ufal/npfl138
git fetch course_repo
git checkout --track course_repo/master -b BRANCH_NAME
```
This creates a branch BRANCH_NAME, and when you run git pull in that branch, it will be updated to the current state of the course repository.
How to merge the course repository updates with your modified branch?

If you want to store your solutions in your branch and gradually update this branch to track the changes in the course repository, you should start by
```
git remote add course_repo https://github.com/ufal/npfl138
git fetch course_repo
git checkout --no-track course_repo/master -b BRANCH_NAME
```
which creates a branch BRANCH_NAME with the current state of the course repository. However, unlike to the previous case, git pull and git push in this branch will not operate on the course repository. Therefore, you can then commit to this branch and push it to your own repository.

To update your branch with the changes from the course repository, run
```
git fetch course_repo
git merge course_repo/master
```
while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same lines in the templates.

ReCodEx

What files can be submitted to ReCodEx?

You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.
What file does ReCodEx execute and what arguments does it use?

Exactly one file with py suffix must contain a line starting with def main(. Such a file is imported by ReCodEx and the main method is executed (during the import, __name__ == "__recodex__").

The file must also export an argument parser called parser. ReCodEx uses its arguments and default values, but it overwrites some of the arguments depending on the test being executed – the template should always indicate which arguments are set by ReCodEx and which are left intact.
What are the time and memory limits?

The memory limit during evaluation is 1.5GB. The time limit varies, but it should be at least 10 seconds and at least twice the running time of my solution.

TensorBoard

Should TensorFlow be installed when using TensorBoard?

When TensorBoard starts, it warns about a reduced feature set because of missing TensorFlow, notably
```
TensorFlow installation not found - running with reduced feature set.
```
Do not worry about the warning, there is no need to install TensorFlow.
Cannot start TensorBoard after installation

If you cannot run the tensorboard command after installation, it is most likely not in your PATH. You can either:
- start tensorboard using python3 -m tensorboard.main --logdir logs, or
- add the directory with pip installed packages to your PATH (that directory is either bin/Scripts in your virtual environment if you use a virtual environment, or it should be ~/.local/bin on Linux and %UserProfile%\AppData\Roaming\Python\Python311 and %UserProfile%\AppData\Roaming\Python\Python311\Scripts on Windows).
What can be logged in TensorBoard? See the documentation of the SummaryWriter. Common possibilities are:
- scalar values:
```
summary_writer.add_scalar(name like "train/loss", value, step)
```
- tensor values displayed as histograms or distributions:
```
summary_writer.add_histogram(name like "train/output_layer", tensor, step)
```
- images as tensors with shape [num_images, h, w, channels], where channels can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):
```
summary_writer.add_images(name like "train/samples", images, step, dataformats="NHWC")
```
  Other dataformats are "HWC" (shape [h, w, channels]), "HW", "NCHW", "CHW".
- possibly large amount of text (e.g., all hyperparameter values, sample translations in MT, …) in Markdown format:
```
summary_writer.add_text(name like "hyperparameters", markdown, step)
```
- audio as tensors with shape [1, samples] and values in $[-1,1]$ $[- 1, 1]$ range:
```
summary_writer.add_audio(name like "train/samples", clip, step, [sample_rate])
```
- traced modules using:
```
summary_writer.add_graph(module, example_input_batch)
```

Requirements

To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.

Exam Questions

Lecture 1 Questions

Considering a neural network with $D$ input neurons, a single hidden layer with $H$ neurons, $K$ output neurons, hidden activation $f$ and output activation $a$ , list its parameters (including their size) and write down how the output is computed. [5]
List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]
Formulate the Universal approximation theorem. [5]

Lecture 2 Questions

Define maximum likelihood estimation, and show that it is equal to minimizing NLL, minimizing cross-entropy, and minimizing KL divergence. [10]
Define mean squared error, show how it can be derived using MLE (define $p_{\textrm{model}}$ , show how MLE looks using $p_{\textrm{model}}$ , and prove that the maximum likelihood estimate is equal to minimizing MSE). [5]
Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]
Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]
Write down the backpropagation algorithm. [5]
Write down the mini-batch SGD algorithm with momentum. Then, formulate SGD with Nesterov momentum and show the difference between them. [5]
Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of $1/\sqrt{t}$ in step $t$ . Then write down the RMSProp algorithm and explain how it solves the problem with the involuntary learning rate decay. [10]
Write down the Adam algorithm. Then show why the bias-correction terms $(1-\beta^t)$ make the estimation of the first and second moment unbiased. [10]

Lecture 3 Questions

Considering a neural network with $D$ input neurons, a single ReLU hidden layer with $H$ units and softmax output layer with $K$ units, write down the explicit formulas (i.e., without differential operators) of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input $\boldsymbol x$ , target $g$ and negative log likelihood loss. [10]
Assume a network with MSE loss generated a single output $o \in \mathbb{R}$ , and the target output is $g$ . What is the value of the loss function itself, and what is the explicit formula (i.e., without a differential operator) of the gradient of the loss function with respect to $o$ ? [5]
Assume a binary-classification network with cross-entropy loss generated a single output $z \in \mathbb{R}$ , which is passed through the sigmoid output activation function, producing $o = \sigma(z)$ . If the target output is $g$ , what is the value of the loss function itself, and what is the explicit formula (i.e., without a differential operator) of the gradient of the loss function with respect to $z$ ? [5]
Assume a $K$ -class-classification network with cross-entropy loss generated a $K$ -element output $\boldsymbol z \in \mathbb{R}^K$ , which is passed through the softmax output activation function, producing $\boldsymbol o=\operatorname{softmax}(\boldsymbol z)$ . If the target distribution is $\boldsymbol g$ , what is the value of the loss function itself, and what is the explicit formula (i.e., without a differential operator) of the gradient of the loss function with respect to $\boldsymbol z$ ? [5]
Define $L_2$ regularization and describe its effect both on the value of the loss function and on the value of the loss function gradient. [5]
Describe the dropout method and write down exactly how it is used during training and during inference. [5]
Describe how label smoothing works for cross-entropy loss, both for sigmoid and softmax activations. [5]
How are weights and biases initialized using the Glorot initialization? [5]

Lecture 4 Questions

Write down the equation of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$ , the stride is $T \times S$ , the operation performed is in fact cross-correlation (as usual in convolutional neural networks) and that $O$ output channels are computed. [5]
Explain both SAME and VALID padding schemes and write down the output size of a convolutional operation with an $N \times M$ kernel on image of size $H \times W$ for both these padding schemes (stride is 1). [5]
Describe batch normalization including all its parameters, and write down an algorithm how it is used during training and the algorithm how it is used during inference. Be sure to explicitly write over what is being normalized in case of fully connected layers and in case of convolutional layers. [10]
Describe overall architecture of VGG-19 (you do not need to remember the exact number of layers/filters, but you should describe the overall order and type of layers that are used). [5]

Lecture 5 Questions

Describe overall architecture of ResNet. You do not need to remember the exact number of layers/filters, but you should draw a bottleneck block (including the applications of BatchNorms and ReLUs) and state how residual connections work when the number of channels increases. [10]
Draw the original ResNet block (including the exact positions of BatchNorms and ReLUs) and also the improved variant with full pre-activation. [5]
Compare the bottleneck block of ResNet and ResNeXt architectures (draw the latter using convolutions only, i.e., do not use grouped convolutions). [5]
Describe the CNN regularization method of networks with stochastic depth. [5]
Compare Cutout and DropBlock. [5]
Describe in detail how is CutMix performed. [5]
Describe Squeeze and Excitation applied to a ResNet block. [5]
Draw the Mobile inverted bottleneck block (including explanation of separable convolutions, the expansion factor, exact positions of BatchNorms and ReLUs, but without describing Squeeze and excitation blocks). [5]
Assume an input image $I$ of size $H \times W$ with $C$ channels, and a convolutional kernel $K$ with size $N \times M$ , stride $S$ and $O$ output channels. Write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5]

Lecture 7 Questions

Write down how the Long Short-Term Memory (LSTM) cell operates, including the explicit formulas. Also mention the forget gate bias. [10]
Write down how the Gated Recurrent Unit (GRU) operates, including the explicit formulas. [10]
Why the usual dropout cannot be used on recurrent state? Describe how the problem can be alleviated with variational dropout. [5]
Describe layer normalization including all its parameters, and write down how it is computed (be sure to explicitly state over what is being normalized in case of fully connected layers and convolutional layers). [5]
Draw a tagger architecture utilizing word embeddings, recurrent character-level word embeddings (including how are these computed from individual characters), and two sentence-level bidirectional RNNs (explaining the bidirectionality) with a residual connection. Where would you put the dropout layers? [10]

Lecture 8 Questions

In the context of named entity recognition, describe what the BIO encoding is and why it is used. [5]
Write down the dynamic programming algorithm for decoding a BIO-tag sequence, including its asymptotic complexity. [10]
In the context of CTC loss, describe regular and extended labelings and write down the algorithm for computing the log probability of a gold label sequence $\boldsymbol y$ . [10]
Describe how CTC predictions are performed using a beam-search. [5]
Draw the CBOW architecture from word2vec, including the sizes of the inputs and the sizes of the outputs and used non-linearities. Also make sure to indicate where the embeddings are being trained. [5]
Draw the SkipGram architecture from word2vec, including the sizes of the inputs and the sizes of the outputs and used non-linearities. Also make sure to indicate where the embeddings are being trained. [5]
Describe the hierarchical softmax used in word2vec. [5]
Describe the negative sampling proposed in word2vec, including the choice of distribution of negative samples. [5]
Explain how are ELMo embeddings trained and how are they used in downstream applications. [5]

Lecture 9 Questions

Considering machine translation, draw a recurrent sequence-to-sequence architecture without attention, both during training and during inference (include embedding layers, recurrent cells, classification layers, argmax/softmax). [5]
Considering machine translation, draw a recurrent sequence-to-sequence architecture with attention, used during training (include embedding layers, recurrent cells, attention, classification layers). Then write down how exactly is the attention computed. [10]
Explain how is word embeddings tying used in a sequence-to-sequence architecture, including the necessary scaling. [5]
Write down why are subword units used in text processing, and describe the BPE algorithm for constructing a subword dictionary from a large corpus. [5]
Write down why are subword units used in text processing, and describe the WordPieces algorithm for constructing a subword dictionary from a large corpus. [5]
Pinpoint the differences between the BPE and WordPieces algorithms, both during dictionary construction and during inference. [5]
Describe the Transformer encoder architecture, including the description of self-attention (but you do not need to describe multi-head attention), FFN and positions of LNs and dropouts. [10]
Write down the formula of Transformer self-attention assuming you get sequence representation $\boldsymbol X \in \mathbb{R}^{n \times d}$ , and then describe multi-head self-attention in detail, including the dimensionality of the individual heads. [10]
Describe the Transformer decoder architecture, including the description of self-attention and masked self-attention (but you do not need to describe multi-head attention), FFN and positions of LNs and dropouts. Also discuss the difference between training and prediction regimes. [10]

Lecture 10 Questions

Why are positional embeddings needed in Transformer architecture? Write down the sinusoidal positional embeddings used in the Transformer. [5]
Compare RNN to Transformer – what are the strengths and weaknesses of these architectures? [5]
Describe the BERT architecture (you do not need to describe the (multi-head) self-attention operation). Elaborate also on which positional embeddings are used and what are the GELU activations. [10]
Describe the GELU activations and explain why are they a combination of ReLUs and Dropout. [5]
Elaborate on BERT training process (what are the two objectives used and how exactly are the corresponding losses computed). [10]
Describe the architecture of a Vision Transformer – how are input images represented, draw the Transformer encoder layer and the FFN sublayer, how is the distribution over predicted classes computed, what positional embeddings are used (and what alternative positional embeddings were tried). [10]

Lecture 11 Questions

Define the Markov Decision Process, including the definition of the return. [5]
Define the value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]
Define the action-value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]
Express the value function using the action-value function, and express the action-value function using the value function. [5]
Formulate the policy gradient theorem. [5]
Prove the part of the policy gradient theorem showing the value of $\nabla_{\boldsymbol\theta} v_\pi(s)$ . [10]
Assuming the policy gradient theorem, formulate the loss used by the REINFORCE algorithm and show how can its gradient be expressed as an expectation over states and actions. [5]
Write down the REINFORCE algorithm, including the loss formula. [10]
Show that introducing baseline does not influence validity of the policy gradient theorem. [5]
Write down the REINFORCE with baseline algorithm, including both loss formulas. [10]
Sketch the overall structure and training procedure of the Neural Architecture Search. You do not need to describe how exactly is the block produced by the controller. [5]
Write down the variational lower bound (ELBO) in the form of a reconstruction error minus the KL divergence between the encoder and the prior (i.e., in the form used for model training). Then prove it is actually a lower bound on the log-likelihood $\log P(\boldsymbol x)$ . [10]
Draw an architecture of a variational autoencoder (VAE). Pay attention to the parametrization of the distribution from the encoder (including the used activation functions), show how to perform latent variable sampling so that it is differentiable with respect to the encoder parameters (the reparametrization trick), and write down the loss. [10]

Related Courses

Machine Learning for Greenhorns

Introductory course to machine learning, focusing both on theoretical foundations as well as on practical applications in Python.

Deep Reinforcement Learning

Course introducing reinforcement learning, from basic tabular methods to involvement of deep neural networks, focusing both on theory as well as on practical aspects.

Search form

Deep Learning – Summer 2024/25

About

Timespace Coordinates

Lectures

License

External Participants

University Students

1. Introduction to Deep Learning

2. Training Neural Networks

3. Training Neural Networks II

4. Convolutional Neural Networks

5. Convolutional Neural Networks II

6. Object Detection

7. Recurrent Neural Networks

8. Structured Prediction, CTC, Word2Vec

9. Seq2seq, NMT, Transformer

10. Transformer, BERT, ViT

11. Deep Reinforcement Learning, VAE

Requirements

Environment

Teamwork

No Cheating

numpy_entropy

pca_first

mnist_layers_activations

sgd_backpropagation

sgd_manual

mnist_training

gym_cartpole

mnist_regularization

mnist_ensemble

uppercase

mnist_cnn

torch_dataset

mnist_multiple

cifar_competition

cnn_manual

cags_classification

cags_segmentation

bboxes_utils

svhn_competition

sequence_classification

tagger_we

tagger_cle

tagger_competition

tensorboard_projector

tagger_ner

ctc_manual

speech_recognition

lemmatizer_noattn

lemmatizer_attn

lemmatizer_competition

tagger_transformer

sentiment_analysis

reading_comprehension

Submitting to ReCodEx

Competition Evaluation

What Is Allowed

Install

MetaCentrum

AIC

Git

ReCodEx

TensorBoard

Requirements

Exam Questions

Related Courses

Archive