Readability is the ease with which the reader perceives a text. Various readability metrics have been developed, mainly for English. Although some of them are based on apparently language-independent features, e.g. number of syllables, their interpretation is language-dependent. Copywriters can make use of numerous readability-assessment tools (e.g. CohMetrix, Schreiblabor, and Jasnopis), which combine many features at once. Many of these are inevitably language-dependent. For instance, to count verbs and nouns, the tool has to use a part-of-speech tagger, or at least a morphological lexicon, for the given language.
There is no such tool for Czech yet. To build one, we need a corpus of texts with comparable content and comprehension tests on a large number of readers. Since this project is associated with the COST Action CA16204 Distant Reading for the European Literary History, our corpus is going to comprise scholarly texts about Czech 19th-century fiction, tested on high-school and university students.
We will try and adapt selected readability formulas to Czech. Together with the Computational Stylistics Group we are also going to explore whether readability is a useful stylometric feature.
This project is supported by this grant from the Czech Ministry of Youth, Sports, and Education (in Czech only).