Turku Neural Parser Pipeline

Lemmatization Text algorithms

Year: 2,020

Authors: Jenna Kanerva, Filip Ginter, Tapio Salakoski

Journal: Natural Language Engineering

Languages: English, French, German, Spanish

Programming languages: Python

Input data:

Plain text

Output data:

CoNLL-U Format (https://universaldependencies.org/format.html)

Tags:

Lemmatization, Universal Dependencies, Parsing, Sequence-to-sequence model, Machine learning, Supervised

Add to Favorites

Lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. This context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a morphological tagger. Outperforms all latest baseline systems (2020). Compared to the best overall baseline this system outperforms it on 62 out of 76 treebanks reducing errors on average by 19% relative. The lemmatizer together with all trained models is made available as a part of the Turku-neural-parsing-pipeline under the Apache 2.0 license.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Turku Neural Parser Pipeline

Text-to-Text Transfer Transformer

Dialogue Generative Pre-Trained Transformer

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Turku Neural Parser Pipeline

Sign In

Register

Reset Password