Multilingual
- PARSEME corpus - annotated corpora and tools of the PARSEME shared task on automatic identification of verbal multiword expressions (various flavors of the CC BY license):
- edition 1.3, 26 languages, UD-compatible
- edition 1.2, 14 languages
- edition 1.1, 19 languages
- edition 1.0, 18 languages
- Multilingual corpus of literal occurrences of multiword expressions (under various flavors of the CC BY license) - in Basque, German, Greek, Polish and Portuguese
- Prolexbase - multilingual database and ontology of proper names (CC-BY SA license), mostly French, Polish and English
Polish
- Multi-word expressions:
- MweLitRead - a dataset of Polish MWEs and their literal readings (CC-BY SA license)
- SEJF - Grammatical Lexicon of Polish Phraseology (CC-BY SA license)
- SEJFEK - Grammatical Lexicon of Polish Economic Phraseology and its lexicalized shallow grammar version SEJFEK4Spejd (CC-BY SA license)
- Składnica-MWEs - Polish constituency treebank Składnica enriched with MWEs annotations; the annotation results from an automatic mapping of 3 MWE resources, followed by a manual validation; over 2,000 MWEs are annotated in about 9,000 trees (GPL v3 license), see the reference paper for more details.
- Named entities:
- Coreference:
- PPC - Polish Coreference Corpus (CC BY v.3 license)
- See also the Polish modules in the multilingual resources above
Spanish
This resource is a result of a pilot study on lexical description of multi-word expressions (MWEs) in four Spanish dialects from Latin America (Colombia, Costa Rica, Peru and Mexico). It is an outcome of the Business Intelligence Seminar student project carried out within the IT4BI Erasmum Mundus master program. It is available under the 2-clause BSD license.
- XML database containing the lexical description of 100 Spanish MWEs in four dialects, together with their generic and dialect-specific properties (meaning, dialect, language register, passivization, partial inflection, etc.)
- XML schema for the database
- 255 examples of MWEs in four dialects, with word-by-word translations, idiomatic readings and some examples of usage [.xlsx.zip]
- References:
- Arauco, A., Bogantes, D., Rodríguez, A., Rodríguez, E. (2015) Representation and Identification of Multiword Expressions in different Spanish Dialects, Technical Report 314, Laboratoire d'informatique, Francois Rabelais University of Tours, France [bibtex].
- Bogantes, D., Rodríguez, E., Arauco, A., Rodríguez, A., Savary, A. (2015) Towards Lexical Encoding of Multiword Expressions in Spanish Dialects, in the PARSEME 5th general meeting, 23-24 September 2015, Iași, Romania (poster) [bibtex].
- Bogantes, D., Rodríguez, E., Arauco, A., Rodríguez, A., Savary, A. (2016): Towards Lexical Encoding of Multiword Expressions in Spanish Dialects, in the Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16), 23-28 May 2016, Portorož, Slovenia.