Stanza
Year: 2,020
Journal: Stanford University
Languages: Afrikaans, Ancient Greek, Arabic, Armenian, Basque, Belarusian, Bulgarian, Buryat, Catalan, Chinese (simplified), Chinese (traditional), Classical Chinese, Coptic, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Kurmanji, Latin, Latvian, Lithuanian, Livvi, Maltese, Marathi, North Sami, Norwegian (Bokmaal), Norwegian (Nynorsk), Old Church Slavonic, Old French, Old Russian, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Swedish Sign Language, Tamil, Telugu, Turkish, Ukrainian, Upper Sorbian, Urdu, Uyghur, Vietnamese, Wolof
Programming languages: Python
Input data:
Plain text
Output data:
Annotations (tokenization, multi-word token expansion,
lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and
named entity recognition)
Project website: https://stanfordnlp.github.io/stanza/
Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multiword token expansion, lemmatization, part-ofspeech and morphological feature tagging, dependency parsing, and named entity recognition.