Protein Ligandability Recognition
-
Sample proteins: plr.sample.zip
Important: This sample data should NOT be used for solution of assigned tasks! – Neither for learning, nor for evaluation! - Data sets to be used for your exercise: set.A.zip set.B.zip set.C.zip set.D.zip
-
Feature set description: plr.attributes.pdf
Native Language Identification
- Data for clustering
- Development data fv.c.1.gram.rel.traindev.csv, train.txt, dev.txt
-
Selected references
- Hladká Barbora, Holub Martin, Kríž Vincent: Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, ACL, Atlanta, Georgia, USA, pp. 232-241, 2013.
- Kríž Vincent, Holub Martin, Pecina Pavel: Feature Extraction for Native Language Identification Using Language Modeling. In: Proceedings of Recent Advances in Natural Language Processing, Research Group in Computational Linguistics, University of Wolverhampton, UK, Hisarja, Bulgaria, ISSN 1313-8502, pp. 298-306, 2015.
- Ircing Pavel, Švec Jan, Zajíc Zbyněk, Hladká Barbora, Holub Martin: Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. Stroudsburg, PA, USA, pp. 198-209, 2017
Movie Recommendation Task
- Feature description mov.handout
- Development data mov.development.csv
- Development data for cross-validation mov.data.zip
- Load the data into R using load-mov-data.R
Word Sense Disambiguation
- Feature description wsd.attributes.pdf
- Development data wsd.development.csv
Semantic Collocations Identification
- Task description col-description.pdf
- Feature description col.attributes.pdf
- Development data col.development.csv
Selected data sets
-
Auto
- dim(Auto) # [1] 392 9
- description
-
Baseball players
- baseball.players.csv
- examples = read.csv("baseball.players.csv", header=TRUE, sep=";")
- dim(data) # [1] 1034 6
- description
-
Caravan
- library(ISLR); dim(Caravan) # [1] 5822 86
- description
-
College
- library(ISLR); dim(College) # [1] 777 18
- dataset-29177.csv
- description
-
Students
- examples = read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv", header=T)
- dim(examples) #[1] 200 6
-
USArrests
- dim(USArrests) # [1] 50 4
- description
- Titanic data set
Example data – evaluation of a written test – data and the description of the fields
ZS.2018 — lectures