1. Introduction
February 21, 2024
Intro to NLP
Questions
Lecturer: Pavel Pecina
Topics:
- Motivation for NLP.
- Basic notions from probability and information theory.
2. Language modeling.
February 28, 2024
Language Models
Questions
Lecturer: Pavel Pecina
Topics:
- Language models.
- The noisy channel model.
- Markov models.
March 6, 2024
IR
Assignment on IR
Lecturer: Pavel Pecina
Topics:
- Intro to IR.
- Boolean model.
- Inverted index.
March 13, 2024
IR cont.
Questions
Lecturer: Pavel Pecina
Topics:
- Probabilistic models for Information Retrieval.
5. Overview of Language Data Resources
March 20, 2024
Data resources
Questions
Assignment on diacritics
Lecturer: Zdeněk Žabokrtský
Lecture topics:
- Types of language data resources.
- Annotation principles.
Practicals:
6. Evaluation measures in NLP
March 27, 2024
Evaluation
Questions
Lecturer: Zdeněk Žabokrtský
Topics:
- Purposes of evaluation.
- Evaluation best practices, estimating upper and lower bounds.
- Task-specific measures.
7. Morphological analysis
April 3, 2024
Morphology
Questions
Lecturer: Daniel Zeman
Topics:
- Morphological tags, parts of speech, morphological categories.
- Finite-state morphology.
(Slides covered down to no. 46. To be completed next week.)
Practicals:
8. Syntactic analysis
April 10, 2024
Syntax
Questions
Lecturer: Daniel Zeman
Topics:
- Dependency vs. phrase-based model.
- Dependency parsing.
9. Introduction to Deep Learning in NLP
April 17, 2024
Deep learning intro
Recording
Assignment on NN interpretation
Lecturer: Jindřich Libovický
Topics:
- Neural network basics
- Word embeddings, sequence-processing architectures
- Pre-trained models: Word2Vec, BERT
The excercise is available in a Google Colab Sheet.
10. Deep learning applications in NLP
April 24, 2024
DL in applications
LLMs
Recording
Questions
Lecturer: Jindřich Libovický
Topics:
- Named entity recognition
- Answer span selection
- Generative language models
11. Machine translation
May 15, 2024
MT intro+Word Alignment+PBMT
Word Alignment by Philipp Koehn
Recording of the Lecture
Lab: IBM1 Word Alignment
Lecturer: Ondřej Bojar
Topics:
- Introduction to MT.
- MT evaluation.
- Alignment.
- Phrase-Based MT.
Additonal materials:
12. Machine translation, cont.
May 22, 2024
Main Slides: Neural MT
Extra Slides: Transformer
Recording of the Lecture
Questions
Topics:
- Fundamental problems of PBMT.
- Neural machine translation (NMT).
- Brief summary of NNs.
- Sequence-to-sequence, with attention.
- Transformer, self-attention.
- Linguistic features in NMT.
1. Boolean Retrieval
2. Diacritics restoration
3. Analysis of a Trained Model for Sentiment Classifation
1. Boolean Retrieval
Deadline: 3rd April 2024, 23:59
- Implement the inverted index with a hash used for the dictionary part of the index.
- Implement algorithms for postings intersection and union.
- Index the provided document collection.
- Write a query parser for AND, OR, and NOT.
- Process the provided set of boolean queris and submit the results.
- Write a short report on you work.
- Submit all the files in a zip archive by email to pecina@ufal.mff.cuni.cz by the given deadline.
2. Diacritics restoration
Deadline: 15th April 2024, 23:59
- Implement a program that reads a Czech text with removed diacritics from STDIN and print the same text with restored diacritics to STDOUT.
- A possible solution: build a Czech corpus of your own (e.g. by using a few e-books or news or Wikipedia or ...) that contains at least 100k tokens (words and punctuation marks). Create a modified copy of the corpus in which all Czech diacritics is removed. Extract a mapping from words without diacritics to words with diacritics. For out-of-vocabulary words use letter-trigram language model.
- Evaluate the accuracy of the restoration as a percentage of correct non-white characters in the output.
- Evaluation datasets - 2 randomly chosen articles from vesmir.cz:
- You can use any programming language as long as it can be compiled/executed on a Linux without too much tweaking (esp. without purchasing any license). Recommended choice: Python 3.
- You can use the devtest data any times you need, but you should use the etest data for evaluation only once.
- Ideally, organize the execution of the whole experiment into a Makefile that (after typing make all) downloads your training data, as well as the development and evaluation test sets from the links above, trains the model, applies it on the development data and evaluates the accuracy.
- Write a short summary (1-2 paragraphs) of the experiment and store it into a README file (txt, md, or pdf).
- Submission:
- Please zip the whole directory and send it by email to zabokrtsky@ufal.mff.cuni.cz by the given deadline.
- Alternatively, you can submit the directory with your solution using the faculty GitLab server. Detailed information on creating and using your GitLab repository is available within the course NPFL125. For our course, the instructions are to be modified in an expectable way:
- Your project name should be "NPFL124"; the identifier should be "npfl124".
- Access to your repository should be given to Zdeněk Žabokrtský.
3. Analysis of a Trained Model for Sentiment Classifation
Deadline: 13th May 2024, 23:59
In this assignment, you will analyze the weights of a trained neural network. In the practical following Lecture 9, you trained several classifiers for sentiment analyses. Your goal in this assignment will be to interpret the weight of one of the networks you trained in the pracitals: Model 2 based on 1D convolution. If you did not manage to finish model in the practical or you are unsure about your solution, you will receive a reference solution of the CNN-based model on April 24 via email from SIS (email the instructor if not).
The first step in the convolution is multiplying the word embeddings with a weight matrix to analyze the response of convolutional filters. The output of this multiplication can be considered as a measure of how strongly the embeddings match the weight vectors in the convolution, so-called filters. These are the values that you will work with.
-
Using the input word embeddings (you will likely find them in model.embeddings.weight
) and the convolutional filter weights (likely in model.conv[0][1].weight
), find the words that lead to the highest filter responses. The response is computed as a dot product of the respective word embeddings and vectors from the weight matrices (you wil have to transpose the weights correctly, then you can find the best-scoring ones using topk
function, think of setting the correct dimension). For simplicity, you can only work with kernels of size 1 but feel free to consider longer spans too. (Method tokenizer.convert_ids_to_tokes
might be useful to convert the indices back to tokens.) [50% of the assignment]
-
Look at the results and qualitatively assess what words appear among the best-scoring ones. Write a few paragraphs of 100 to 400 words. [20% of the assignment]
-
Analyze what POS triggers the convolutional filters the most: compute a statistic how often different POS appear among the best scoring words. For each word, only consider the most frequent POS tag.(You can get the most frequent POS tags, e.g., from the English Web Treebank.) Speculate about the reasons for the statistics that you observe. . Present your results in a table and write your thoughts and comments in at most 200 words. [30% of the assignment]
Feel free to use the Colab Notebook from the practicals as a starting point. You can save the weights your trained model into your Google Drive and load them from a file, so you do not have train the model every time you work on the assignment.
Please write all your code in a replicable way into the notebook. Interleave the code with the text of your analysis and answer the questions. Write 1-2 paragraphs to each of the points in English, Czech or Slovak. Please submit a sharing link to your notebook via the following form.
Pool of possible exam questions
All variants of the final written exam tests will be assembled exclusively from questions selected from the following list:
(warning: the question list might be subject to occasional changes during the semester; the final version will be announced here no later than three weeks before the first exam date.)
Basic notions from probability and information theory.
- What are the three basic properties of a probability function? (1 point)
- When do we say that two events are (statistically) independent? (1 point)
- Show how Bayes' Theorem can be derived. (1 point)
- Explain Chain Rule. (1 point)
- Explain the notion of Entropy (formula expected too). (1 point)
- Explain Kullback-Leibler distance (formula expected too). (1 point)
- Explain Mutual Information (formula expected too). (1 point)
Language models. The noisy channel model.
- Explain the notion of The Noisy Channel. (1 point)
- Explain the notion of the n-gram language model. (1 point)
- Describe how Maximum Likelihood estimate of a trigram language model is computed. (2 points)
- Why do we need smoothing (in language modelling)? (1 point)
- Give at least two examples of smoothing methods. (2 points)
Morphological analysis.
- What is a morphological tag? List at least five features that are often encoded in morphological tag sets. (1 point)
- List the open and closed part-of-speech classes and explain the difference between open and closed classes. (1 point)
- Explain the difference between a finite-state automaton and a finite-state transducer. Describe the algorithm of using a finite-state transducer to transform a surface string to a lexical string (pseudocode or source code in your favorite programming language). (2 points)
- Give an example of a phonological or an orthographical change caused by morphological inflection (any natural language). Describe the rule that would take care of the change during analysis or generation. It is not required that you draw a transducer, although drawing a transducer is one of the possible ways of describing the rule. (1 point)
- Give an example of a long-distance dependency in morphology (any natural language). How would you handle it in a morphological analyzer? (1 point)
Syntactic analysis.
- Describe dependency trees, constituent trees, differences between them and phenomena that must be addressed when converting between them. (2 points)
- Give an example of a sentence (in any natural language) that has at least two plausible, semantically different syntactic analyses (readings). Draw the corresponding dependency trees and explain the difference in meaning. Are there other additional readings that are less probable but still grammatically acceptable? (2 points)
- What is coordination? Why is it difficult in dependency parsing? How would you capture coordination in a dependency structure? What are the advantages and disadvantages of your solution? (1 point)
- What is ellipsis? Why is it difficult in parsing? Give examples of different kinds of ellipsis (any natural language). (1 point)
Information retrieval.
- Explain the difference between information need and query. (1 point)
- What is inverted index and what are the optimal data structures for it? (1 point)
- What is stopword and what is it useful for? (1 point)
- Explain the bag-of-word principle? (1 point)
- What is the main advantage and disadvantage of boolean model. (1 point)
- Explain the role of the two components in the TF-IDF weighting scheme. (1 point)
- Explain length normalization in vector space model what is it useful for? (1 point)
Language data resources.
- Explain what a corpus is. (1 point)
- Explain what annotation is (in the context of language resources). What types of annotation do you know? (2 points)
- What are the reasons for variability of even basic types of annotation, such as the annotation of morphological categories (parts of speech etc.).(1 point)
- Explain what a treebank is. Why trees are used? (2 points)
- Explain what a parallel corpus is. What kind of alignments can we distinguish? (2 points)
- What is a sentiment-annotated corpus? How can it be used? (1 points)
- What is a coreference-annotated corpus? (1 points)
- Explain how WordNet is structured? (1 points)
- Explain the difference between derivation and inflection? (1 points)
Evaluation measures in NLP.
- Give at least two examples of situations in which measuring a percentage accuracy is not adequate. (1 point)
- Explain: precision, recall (1 point)
- What is F-measure, what is it useful for? (1 point)
- What is k-fold cross-validation ? (1 point)
- Explain BLEU (the exact formula not needed, just the main principles). (1 point)
- Explain the purpose of brevity penalty in BLEU. (1 point)
- What is Labeled Attachment Score (in parsing)? (1 point)
- What is Word Error Rate (in speech recognition)? (1 point)
- What is inter-annotator agreement? How can it be measured? (1 point)
- What is Cohen's kappa? (1 point)
Deep learning for NLP.
- Describe the two methods for training of the Word2Vec model. (1 point)
- Explain the difference between Word2Vec and FastText embeddings. (1 point)
- Explain convolutional networks for sequence processing. (1 point)
- What are residual connections in neural networks? Why do we use them? (1 point)
- Explain layer normalization and its effect to the training process. (1 point, 2 points with formula)
- Explain the vanishing gradient problem in recurrent neural networks; name architectures that deal with the issue. (1 point)
- Describe the LSTM networks. (1 point)
- Use formulas to express the loss function for training sequence labeling? (1 point)
- Sketch the structure of the Transformer model. (2 points)
- Why do we use positional encodings in the Transformer model. (1 point)
- Explain the training procedure of the BERT model. (2 points)
Machine translation fundamentals.
- Why is MT difficult from linguistic point of view? Provide examples and explanation for at least three different phenomena. (2 points)
- Why is MT difficult from computational point of view? (1 point)
- Briefly describe at least three methods of manual MT evaluation. (1-2 points)
- Describe BLEU. 1 point for the core properties explained, 1 point for the (commented) formula.
- Describe IBM Model 1 for word alignment, highlighting the EM structure of the algorithm. (1 point)
- Explain using equations the relation between Noisy channel model and log-linear model for classical statistical MT. (2 points)
- Describe the loop of weight optimization for the log-linear model as used in phrase-based MT. (1 point)
Neural machine translation.
- Describe the critical limitation of PBMT that NMT solves. Provide example training data and example input where PBMT is very likely to introduce an error. (1 points)
- Use formulas to highlight the similarity of NMT and LMs. (1 point)
- Describe, how words are fed to current NMT architectures and explain why is this beneficial over 1-hot representation. (1 point)
- Sketch the structure of an encoder-decoder architecture of neural MT, remember to describe the components in the picture (2 points)
- What is the difference in RNN decoder application at training time vs. at runtime? (1 point)
- What problem does attention in NMT address? Provide the key idea of the method. (1 point)
- What problem/task do both RNN and self-attention resolve and what is the main benefit of self-attention over RNN? (1 point)
- What are the three roles each state at a Transformer encoder layer takes in self-attention. (1 point)
- What are the three uses of self-attention in the Transformer model? (1 point)
- Provide an example of NMT improvement that was assumed to come from additional linguistic information but occurred also for a simpler reason. (1 point)
- Summarize and compare the strategy of "classical statistical MT" vs. the strategy of neural approaches to MT. (1 point)
Homework assignments
- There will be 3 homework assignments.
- For each assignment, you will get points, up to a given maximum
(the maximum is specified with each assignment).
- All assignments will have a fixed deadline (usually in two weeks).
- If you submit the assignment after the deadline, you will get:
- up to 50% of the maximum points if it is less than 2 weeks after the deadline;
- 0 points if it is more than 2 weeks after the deadline.
- Once we check the submitted assignments, you will see the points you got and
the comments from us in:
- To be allowed to take the test (which is required to pass the course), you need to get at least 50% of the total points from
the assignments.
Exam test
Grading
Your grade is based on the average of your performance;
the exam test and the homework assignments are weighted 1:1.
- ≥ 90%: grade 1 (excellent)
- ≥ 70%: grade 2 (very good)
- ≥ 50%: grade 3 (good)
- < 50%: grade 4 (fail)
For example, if you get
600 out of 1000 points for homework assignments (60%)
and 36 out of 40 points for the test (90%),
your total performance is 75% and you get a 2.
No cheating
- Cheating is strictly prohibited and any student found cheating will be punished.
The punishment can involve failing the whole course, or, in grave cases,
being expelled from the faculty.
- Discussing homework assignments with your classmates is OK. Sharing code is
not OK (unless explicitly allowed); by default, you must complete the assignments yourself.
- All students involved in cheating will be punished. E.g. if you share
your assignment with a friend, both you and your friend will be punished.