T-Pos
Year: 2,011
Journal: Conference on Empirical Methods in Natural Language Processing
Languages: English
Programming languages: Python
Input data:
If the file is a tab separated file. Use the i-th (starting from 0) column as a text column to read from. Output file will have that column data replaced with the annotated text.
CAUTION: Make sure there are no newline characters in the text column. This will break the format.
Output data:
The output contains the tokenized and tagged words separated by spaces with tags separated by forward slash ‘/’
Project website: https://github.com/aritter/twitter_nlp
The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition.