(Not enough complaining happened, so this is the time we're going with.)
The exam is oral. It will probably take place in the corridor in front of my office (S424, 4th floor, corridor leading towards S1). The dates and times will appear in SIS. My preliminary plan is to give you options every workday afternoon in the weeks starting Jan 20th and Jan 27th, and Feb 10-12 as last-minute options before the semester ends.
There are two components of the exam: an in-depth discussion over literature of your choice, as advertised, and two more surface-level questions.
Here is a reading list compiled from all the literature that was mentioned in the talk (and referenced in the slides).
The exam requires specific preparation to be completed and submitted 36 hours in advance of the exam slot (which will also be the cutoff for signing out of the exam; if SIS doesn't let me set 36 hours, then the cutoff will be 48 h). Make sure you read the following instructions thoroughly, as well as the supplementary information below. If something is unclear, ask in the Discord. These requirements on preparation are designed based on the course's learning outcomes, so that you're coming to the exam with a very good chance of passing.
All the materials that you prepare will be available to you during the exam.
Submit all materials (your imaginary task & challenge specification and the literature review forms) at least 36 hours before the start your selected exam block. (I have to (re-)read the articles too. If 3-4 of you sign up for the same day, that can be up to ~12 hours of work, which is 1 1/2 workdays.)
(ctrl+c, ctrl+v into your favorite note-taking app?)
The selected task itself is not evaluated on anything else than internal consistency. It has no bearing on the exam result if your task is just for your own fun, or if it is for curing Parkinson's disease. You can do weird things or standard things, the originality of the task itself is not evaluated at all. If you feel like you are stuck on this, choose randomly, or write on Discord and I will give you some ideas. :)
This is still just a structure for an exam, so there is no requirement to pick the most state-of-the-art, best performing thing. Imagine that you are doing this literature review in the year when the latest of your selected articles came out. (This is especially because not everyone has done deep learning, which is currently the dominant methodology for most computational music processing tasks. As an exercise, "no deep learning" can easily be one of the requirements you identify in Step 1, if you can justify it.)
If you find an obviously relevant article that the originally selected Article 1 missed and it's a long journal paper or a book chapter (beyond ~20 pages), it can count as both Article 2 and 3. But this is likely the less interesting thing to do, and might in fact be more work, because if something takes over 20 pages there's probably a lot of important details in the math.
All the written materials submitted in preparation for the exam can be written in Czech as well, if it is a problem for you to write these in English. The exam can take place in Czech or English. You will probably want to submit the required forms in your preferred language, but you can choose the language of the exam on the spot.
You are free to use whatever tools you want in your preparation, by which I am of course hinting at the various breeds of LLMs. Note that I will also ask about details in the papers, so don't expect getting away with not actually reading them yourselves. :) But of course feel free to ask the garden of GPTs & co. to point you in the right direction if that is a good way for you to get unstuck on things during preparation.
If you fail, you can re-use the topic and literature for the next attempt -- to the extent to which we agree on, to allow for adaptations to issues such as underspecified task, key challenge selected that does not in fact address the task, literature that is not as relevant as you thought, etc.
Aside from discussing your selected topic in depth, I will ask 2-3 random questions about things that are in the slides. If you do well on your in-depth topic, there is almost no way this part fails you. Example questions: "What is a spectrogram and what are some of its parameters?" "What are the main approaches to optical music recognition?" "What is the PKSpell algorithm for?" "How would you synthesize data for a live score following task?" "How is automatic music transcription usually evaluated?" "What is the relationship between pitch and f0?"
Finally, help each other out. Ask about stuff from your papers on discord. If you have issues understanding something, odds are someone else has the same issue, so you're likely helping someone else. Conversely, the energy you put into answering someone else's question is rarely mis-spent even for yourself. Organize study groups. Do dry runs of your prepared stuff with each other. Especially being in the position of pretend examiner and trying to come up with tricky questions to ask is an excellent exercise.
Have fun!
A successful student of NPFL144 will:
and find out if they perhaps feel inspired by a topic to do a project of their own.
Note: please always check on the day of the lecture whether you can access the required reading (usually by using your university login), and if you can't, contact me immediately (ideally via discord, because it implies others will also likely have issues accessing the article).
Lecture 1. Small, Christopher. 1999. “Musicking — the Meanings of Performing and Listening. A Lecture.” Music Education Research 1 (1): 9–22. doi:10.1080/1461380990010102. https://www.tandfonline.com/doi/abs/10.1080/1461380990010102
Lecture 2.1 P. Smaragdis and J. C. Brown, "Non-negative matrix factorization for polyphonic music transcription," 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 2003, pp. 177-180. https://ieeexplore.ieee.org/document/1285860
Lecture 2.2 N. Bertin, R. Badeau and E. Vincent, "Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 538-549, March 2010. https://ieeexplore.ieee.org/document/5410052
Lecture 3. Foscarin, Francesco, Jan Schlüter, and Gerhard Widmer. "Beat this! Accurate beat tracking without DBN postprocessing." arXiv preprint arXiv:2407.21658 (2024). https://arxiv.org/pdf/2407.21658
Lecture 4. ...anything from the slides so far.
Lecture 5. Nakamura, Eita, Kazuyoshi Yoshii, and Haruhiro Katayose. "Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment." In ISMIR, pp. 347-353. 2017. http://sap.ist.i.kyoto-u.ac.jp/members/yoshii/papers/ismir-2017-nakamura.pdf
Lecture 6. Jeong, D., Kwon, T., Kim, Y. & Nam, J.. (2019). Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance. Proceedings of the 36th International Conference on Machine Learning 97:3060-3070 http://proceedings.mlr.press/v97/jeong19a.html
Lecture 7.1 Byrd & Simonsen: Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images. Journal of New Music Research, 2015, 44, 169-195 https://www.tandfonline.com/doi/full/10.1080/09298215.2015.1045424
Lecture 7.2 Torras, Pau, Sanket Biswas, and Alicia Fornés. "A unified representation framework for the evaluation of Optical Music Recognition systems." International Journal on Document Analysis and Recognition (IJDAR) 27, no. 3 (2024): 379-393. https://link.springer.com/article/10.1007/s10032-024-00485-8
Lecture 8. Mauch Matthias, MacCallum Robert M., Levy Mark and Leroi Armand M. 2015. The evolution of popular music: USA 1960–2010R. Soc. Open Sci.2150081 https://royalsocietypublishing.org/doi/10.1098/rsos.150081
Lecture 9. Varnum, Michael EW, Jaimie Arona Krems, Colin Morris, Alexandra Wormley, and Igor Grossmann. "Why are song lyrics becoming simpler? A time series analysis of lyrical complexity in six decades of American popular music." PloS one 16, no. 1 (2021). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244576
Lecture 10. ...try out a music generation system of your choice & read the paper for that.
Lecture 11. Karlijn Dinnissen and Christine Bauer. 2023. Amplifying Artists’ Voices: Item Provider Perspectives on Influence and Fairness of Music Streaming Platforms. In Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization (UMAP '23). ACM, New York, NY, USA, 238–249. https://dl.acm.org/doi/pdf/10.1145/3565472.3592960
Lecture 12. McBride, John M., Sam Passmore, and Tsvi Tlusty. "Convergent evolution in a large cross-cultural database of musical scales." Plos one 18, no. 12 (2023). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0284851
The subject introduces participants to computational music processing both in the industrial and academic areas: music representations, from audio to symbolic representations (MIDI, MusicXML) to the visual domain (sheet music), and methods from signal processing to machine learning. This subject is a good basis for music-related software projects or theses. Knowing music theory and notation is not required (the essentials will be explained), but if you never had any contact with music, we recommend reading up on terms like harmony or musical form. The subject will be taught in English.
Note on schedule: this is not a centrally scheduled subject, so scheduling will happen once we know who signed up (more or less democratically, depending on how many people are interested).
The subject's discord channel is already open -- you can join and ask for whatever you might want. (Joining the discord of course doesn't obligate you in any way to join the subject, it's there also to help you decide if you want to take it.)
Müller, Meinard. Fundamentals of Music Processing Using Python and Jupyter Notebooks. Cham: Springer, 2021.
https://link.springer.com/content/pdf/10.1007/978-3-030-69808-9.pdf
Müller, Meinard, and Frank Zalkow. "libfmp: A Python package for fundamentals of music processing." Journal of Open Source Software 6, no. 63 (2021): 3326.
https://joss.theoj.org/papers/10.21105/joss.03326.pdf
Lerch, Alexander. An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications. 2nd Edition. New York: Wiley-IEEE Press, 2021.
Freely available as slides: https://github.com/alexanderlerch/ACA-Slides
and accompanying code: https://github.com/alexanderlerch/pyACA
and website: https://www.audiocontentanalysis.org/
Knees, Peter, and Markus Schedl. Music similarity and retrieval: an introduction to audio-and web-based strategies. Vol. 36. Heidelberg: Springer, 2016.
https://link.springer.com/content/pdf/10.1007/978-3-662-49722-7.pdf
A recent tutorial Deep Learning 101 in Audio-based MIR (ISMIR 2024, San Francisco) has an accompanying online book and code for google collabs:
https://geoffroypeeters.github.io/deeplearning-101-audiomir_book/front.html
As you have probably noted, some of the instructional literature on computational music processing pre-dates the boom of deep learning. However, the methods presented there are often still valid and in many application scenarios good enough, and make for good baselines before wheeling out the deep learning artillery.
1. Music and its formalizations. Sound vs. tone. Elementary musical features (tempo, beat, harmony, melody). User roles (listener, distributor, musician).
2. Basics of musical audio processing: signal, sampling, convolution and the deconvolution problem. Resonance, harmonic row and timbre.
3. Audio feature extraction. Tones and pitches. Automated transcription of monophonic and polyphonic recordings, melody extraction, harmony and genre. Beat tracking, downbeat and tempo estimation. Source separation.
4. Symbolic music description. MIDI, matrix view. Music notation formats: ABC, humdrum, LilyPond, MusicXML, MEI. Selected databases of symbolic music.
5. Visual representations of music: notation. OMR worldwide and at matfyz.
6. Musical similarity in symbolic representations and in audio. Search, mulitmodality — query by humming.
7. Multimodality and performance. Score following with and without symbolic representations and its applications. Modeling music expression. Automatic adaptive accompaniment.
8. Singing and lyrics. Singing voice detection, singing voice synthesis, automatic transcription and alignment of sung text.
9. Digital music history. Databases: RISM, F-Tempo, Cantus. Examples of digital editions (Mozart, Josquin), popular music databases (Billboard, Million Songs Dataset).
10. Music generation. Algorithmicity, chance, and generative artificial intelligence. Various human-in-the-loop systems.
11. Music distribution. Recommender systems — collaborative filtering, cold start problem. Copyright: ContentID, fingerprinting. Cover song identification.
12. Music cognition. EEG and music, entrainment, music therapy: Parkinson’s disease, depression, Alzheimer disease.
13. Non-European musical cultures. Ragas, Chinese music, Maqams, Arab-Andalusian music. Folk music. Cultural evolution perspectives.
14. The world of computational music processing: industry, academia, important online resources. Worldwide/Europe/Czechia/matfyz. Improtant open-source libraries.
15. Recap for exam, reserve time, discussion space.
(The subject does not touch on digital music production and audio engineering: no live coding, no DAW work, no VST plugins, etc.)