Turku Neural Parser Pipeline

Year: 2,020
Authors: Jenna Kanerva, Filip Ginter, Tapio Salakoski
Journal: Natural Language Engineering
Languages: English, French, German, Spanish
Programming languages: Python
Input data:

Plain text

Output data:

CoNLL-U Format (https://universaldependencies.org/format.html)


Lemmatization, Universal Dependencies, Parsing, Sequence-to-sequence model, Machine learning, Supervised

Lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. This context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a morphological tagger. Outperforms all latest baseline systems (2020). Compared to the best overall baseline this system outperforms it on 62 out of 76 treebanks reducing errors on average by 19% relative. The lemmatizer together with all trained models is made available as a part of the Turku-neural-parsing-pipeline under the Apache 2.0 license.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.