Universal Dependencies Pipeline
Year: 2,016
Journal: International Conference on Language Resources and Evaluation
Languages: Ancient Greek, Ancient Greek-PROIEL, Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, Finnish-FTB, French, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese-KTC, Latin, Latin-ITT, Latin-PROIEL, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil
Programming languages: C
Input data:
CoNLL-U formatted files
Project website: https://ufal.mff.cuni.cz/udpipehttps://ufal.mff.cuni.cz/udpipe/1
UDPipe, a pipeline processing CoNLL-U-formatted files, performs tokenization, morphological analysis, part-of-speech tagging, lemmatization and dependency parsing for nearly all treebanks of Universal Dependencies 1.2 (namely, the whole pipeline is currently available for 32 out of 37 treebanks). In addition, the pipeline is easily trainable with training data in CoNLL-U format (and in some cases also with additional raw corpora) and requires minimal linguistic knowledge on the users’ part