Turku Neural Parser Pipeline
Plain text
CoNLL-U Format (https://universaldependencies.org/format.html)
Lemmatization, Universal Dependencies, Parsing, Sequence-to-sequence model, Machine learning, Supervised
Lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. This context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a morphological tagger. Outperforms all latest baseline systems (2020). Compared to the best overall baseline this system outperforms it on 62 out of 76 treebanks reducing errors on average by 19% relative. The lemmatizer together with all trained models is made available as a part of the Turku-neural-parsing-pipeline under the Apache 2.0 license.