Summer Semester 2017
Students Presentations
Seminars
March 1
Course logistics ∍ prerequisties ⚫ syllabus ⚫ how to get credits
Notes on deep learning ∍ deep learning ⚫ network building blocks ⚫ network components as functional programming ⚫ deep learning alchemy ⚫ reading the learning curves
Recurrent Neural Networks ∍ definition ⚫ RNN as a program ⚫ excercise with Euclid's algorithm
Reading: | Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014. |
Question: |
What are the problems of the presented architecture? How do you think the neural MT continued after publishing this paper? |
Project proposals for NPFL087 Statistical Machine Translation.
March 8
Recurrent Neural Networks ∍ vanilla RNNs ⚫ vanishing gradient problem ⚫ understanding LSTMs ⚫ Gated Recurrent Units ⚫ neural language models ⚫ word embeddings ⚫ sampling from a language model
Reading: | Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014). |
Question: | What do you think is the main difference between Bahdanau's attention model and the concept of alignment in statistical MT? |
March 15
Attentive sequence-to-sequence learning ∍ RNN as a probabilistic model ⚫ encoder-decoder architecture ⚫ training vs. runtime decoding ⚫ Neural Turing machines as motivation of attention ⚫ attention model ⚫ attention vs. alignment
Implementation and performance ∍ computational graph & backpropagation ⚫ memory consumption
Reading: | Chung, Junyoung, Kyunghyun Cho, and Yoshua Bengio. "A character-level decoder without explicit segmentation for neural machine translation." arXiv preprint arXiv:1603.06147 (2016). |
Question: | What are the reasons authors do not use character-level encoder? How would you improve the architecture such that it would allow character level encoding? |
March 23
Model Ensembling and Beam Search ∍ beam search ⚫ emsembles ⚫ computing in log domain
Big vocabulary problem ∍ copy from source ⚫ subword units ⚫ character-level methods
Reading: | Sennrich, Rico, et al. "Nematus: a Toolkit for Neural Machine Translation." arXiv preprint arXiv:1703.04357 (2017). |
Question: | Compare the Nematus models with the models from Bahdanau et al., 2014. How do they differ? Think of at least three differences. |
Match 29
Implementation in TensorFlow
Reading: | Shen, Shiqi, et al. "Minimum Risk Training for Neural Machine Translation." Proceeding of ACL 2016 (2016). |
Question: | ??? |
April 5
Advanced Optimization ∍ reinforcement learning ⚫ minimum risk training