For centuries, linguists have deliberated on how to represent meaning. In recent years, this inquiry has been viewed not only as an intriguing theoretical challenge but also due to its practical implications for various applications, since meaning representation can serve, in general, as a basis for any system requiring sound and reliable knowledge representation to enable logical inference.
While numerous formalisms for meaning representation have been proposed in recent decades, this project focuses on specific approaches: the meaning representation used in the Prague Dependency Treebank family (PDT) and the Uniform Meaning Representation. The choice of the first formalism is motivated by the availability of data for Czech, particularly the PDT-C treebank. This treebank provides the most comprehensive Czech data (almost 175.5 thousand sentences across different genres) with fine-grained annotation at the tectogrammatical level, capturing linguistically structured meaning. The second approach, Uniform Meaning Representation (UMR), offers significant potential to enhance the PDT-C representation in several key ways:
The primary objective of the project is to explore the feasibility of a (semi-)automatic conversion of the PDT-C data into a format that adheres to the UMR specification. In particular, the project aims to identify:
This data release was without Czech data yet, but we put it here for completeness. It is available from the LINDAT/CLARIAH-CZ repository at http://hdl.handle.net/11234/1-5198. It contains all the data annotated by the U.S. team.
This data release contains the first version of the Czech conversion and manually prepared Latin data, also by the ÚFAL MFF UK team. It is available also at LINDAT/CLARIAH-CZ repositroy at TBA.
The development of Czech UMR has been supported by the following projects:
Project UMR – Uniform Meaning Representation, No. LUAUS23283, in the Inter-Excellence II program (Inter-Action subprogram), 2023-2027
The project supports primarily cooperation with the U.S. partner, preparation for release, manual checks, and the work on the SynSemClass event-type ontolog for application on UMR.
Project LUSyD: Language Understanding: from Syntax to Discourse, GAČR EXPRO program, Project No. GX20-16819X
This project serves as the fundamental research on meaning representations in general, testing various Natural Language Understanding tools, work on discourse etc., and the foundations of the SynSemClass event-type ontology. From the UMR perspective, in serves for support of the basic understanding of the UMR principles in the broader approach to meaning representations.
Project of the large research infrastructure LINDAT/CLARIAH-CZ, project No. LM2023062, MŠMT LRI program
This project gives the infrastructural support for hosting the necessary data, tools and services developed in the UMR project and related resources. It also serves as the primary distribution repository for the U.S. partner-developed data.
The UMR for Czech is also related to the following project:
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages, project No. 104924, GAUK