Universal Dependencies Pipeline

Year: 2,016
Journal: International Conference on Language Resources and Evaluation
Languages: Ancient Greek, Ancient Greek-PROIEL, Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, Finnish-FTB, French, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese-KTC, Latin, Latin-ITT, Latin-PROIEL, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil
Programming languages: C
Input data:

CoNLL-U formatted files

UDPipe, a pipeline processing CoNLL-U-formatted files, performs tokenization, morphological analysis, part-of-speech tagging, lemmatization and dependency parsing for nearly all treebanks of Universal Dependencies 1.2 (namely, the whole pipeline is currently available for 32 out of 37 treebanks). In addition, the pipeline is easily trainable with training data in CoNLL-U format (and in some cases also with additional raw corpora) and requires minimal linguistic knowledge on the users’ part

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.