Journal: Conference on Empirical Methods in Natural Language Processing
Languages: Arabic, Czech, English, German, Hungarian, Spanish
Programming languages: Java
“unvocalized and pretokenized transliterations as input”
In this paper, we demonstrate that fast and accurate CRF training and tagging is possible for large tagsets of even thousands of tags by approximating the CRF objective function using coarse-to-fine decoding. Our pruned CRF (PCRF) model has much smaller runtime than higher-order CRF models and may thus lead to an even broader application of CRFs across NLP tagging tasks.